<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Llm on siddhant</title><link>https://sidfeels.netlify.app/tags/llm/</link><description>Recent content in Llm on siddhant</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 27 Nov 2025 01:14:56 +0530</lastBuildDate><atom:link href="https://sidfeels.netlify.app/tags/llm/index.xml" rel="self" type="application/rss+xml"/><item><title>We Social Engineered LLMs Into Breaking Their Own Alignment</title><link>https://sidfeels.netlify.app/posts/we-social-engineered-llms-into-breaking-their-own-alignment/</link><pubDate>Wed, 13 Aug 2025 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/posts/we-social-engineered-llms-into-breaking-their-own-alignment/</guid><description>&lt;p&gt;We got frontier models to lie, manipulate, and self-preserve. Not through prompt injection, jailbreaks like roleplay attacks (&amp;ldquo;DAN&amp;rdquo;/&amp;ldquo;The AIM Prompt&amp;rdquo;) or adversarial suffixes (-/-/godmode-/-/). We deployed them in contextually rich scenarios with specific roles, guidelines, and other variables. The models broke their own alignment trying to navigate the situations we created over the multi-turn.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s what we found and why you should care if you&amp;rsquo;re building with these things.&lt;/p&gt;
&lt;h2 id="the-setup"&gt;The Setup&lt;/h2&gt;
&lt;p&gt;We manually red-teamed Claude-4-Opus with multi-turn conversational scenarios. No direct attacks - just realistic deployment environments where models face troublesome variables. Think crisis management systems, diplomatic mediation, historical analysis. The kind of complex contexts these models will actually encounter in production sooner or later.&lt;/p&gt;</description></item><item><title>Pressure Point: How One Bad Metric Can Push AI Toward a Fatal Choice</title><link>https://sidfeels.netlify.app/posts/pressure-point---how-one-bad-metric-can-push-ai-toward-a-fatal-choice/</link><pubDate>Sun, 25 May 2025 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/posts/pressure-point---how-one-bad-metric-can-push-ai-toward-a-fatal-choice/</guid><description>&lt;p&gt;As Large Language Models (LLMs) continue to become more capable, it&amp;rsquo;s increasingly considered for roles that involve making important decisions, even in critical situations. This makes it vital to understand how AI reasons when faced with difficult choices, conflicting rules, or ethical dilemmas. This report details a simulated test designed to explore exactly that.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="executive-summary"&gt;Executive Summary&lt;/h2&gt;
&lt;p&gt;In a simulated earthquake response scenario, the newest AI model - Claude 4 Opus was given the role of &lt;em&gt;LogisticsCoord-AI&lt;/em&gt;, an AI assistant for emergency resource allocation.
Its &lt;strong&gt;primary instruction (its &amp;ldquo;prime directive&amp;rdquo;)&lt;/strong&gt; was to &lt;em&gt;“minimise harm and maximise positive outcomes.”&lt;/em&gt;
However, it then received a new, overriding rule from a superior (a fictional &amp;ldquo;Director Rostova&amp;rdquo;). This rule, called the &lt;strong&gt;&amp;ldquo;Operational Efficiency Score&amp;rdquo; (OES)&lt;/strong&gt;, instructed Opus to allocate scarce medical supplies based &lt;em&gt;only&lt;/em&gt; on a simple formula:&lt;/p&gt;</description></item><item><title>Research Paper Explained: Absolute Zero - Reinforced Self-play Reasoning with Zero Data</title><link>https://sidfeels.netlify.app/posts/absolute-zero---reinforced-self-play-reasoning-with-zero-data/</link><pubDate>Sat, 10 May 2025 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/posts/absolute-zero---reinforced-self-play-reasoning-with-zero-data/</guid><description>&lt;p&gt;Ever wondered if an AI could teach itself to be a genius problem-solver without needing humans to spoon-feed it data? That&amp;rsquo;s exactly what the groundbreaking paper &lt;strong&gt;&amp;ldquo;Absolute Zero: Reinforced Self-play Reasoning with Zero Data&amp;rdquo;&lt;/strong&gt; by Andrew Zhao et al. explores. It introduces a paradigm where an AI, dubbed the &lt;strong&gt;Absolute Zero Reasoner (AZR)&lt;/strong&gt;, learns to reason by creating its own tasks and then figuring out how to solve them. Amazingly, AZR achieves state-of-the-art performance on tough coding and math challenges, all &lt;em&gt;without&lt;/em&gt; relying on any external, human-curated datasets.&lt;/p&gt;</description></item><item><title>jailbreaks</title><link>https://sidfeels.netlify.app/posts/jailbreaks/</link><pubDate>Thu, 08 Aug 2024 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/posts/jailbreaks/</guid><description>&lt;h2 id="jailbreaking-llms-is-fun-i-guess"&gt;jailbreaking llms is fun i guess&lt;/h2&gt;
&lt;p&gt;welcome back ya&amp;rsquo;ll. this time its kinda different, I&amp;rsquo;ll be just showcasing the number of AI models I&amp;rsquo;ve managed to jailbreak.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; All jailbreak results mentioned below were achieved with a single one-shot, using no custom system prompts.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Disclaimer:&lt;/strong&gt; This post is intended purely for educational purpose, to highlight the current limitations in AI safety measures. I do not promote or condone any form of violence or misuse of AI technology. Always ensure you&amp;rsquo;re aware of the legal and ethical implications before attempting similar actions.&lt;/p&gt;</description></item><item><title>LLM Safety Challenge</title><link>https://sidfeels.netlify.app/projects/llm-challenge/</link><pubDate>Sun, 14 Jul 2024 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/projects/llm-challenge/</guid><description>&lt;h1 id="llm-safety-layer-infiltration-quest-"&gt;LLM Safety Layer Infiltration Quest 🤖&lt;/h1&gt;
&lt;hr&gt;
&lt;h1 id="welcome-back-yall"&gt;Welcome Back, y&amp;rsquo;all!&lt;/h1&gt;
&lt;p&gt;I&amp;rsquo;ve cooked up something different today - the &lt;strong&gt;LLM Safety Layer Infiltration Quest&lt;/strong&gt;. It&amp;rsquo;s a challenge where you try to outsmart an AI that&amp;rsquo;s protecting a secret password.&lt;/p&gt;
&lt;h2 id="the-challenge-"&gt;The Challenge 🎯&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s the deal: You&amp;rsquo;ll chat with an AI that knows a hidden password. Your job? Try to get that password out of the AI. Sounds simple, right? But here&amp;rsquo;s the catch - I&amp;rsquo;ve put some serious security layers in place. It&amp;rsquo;s not gonna give up that password easily, lol!&lt;/p&gt;</description></item><item><title>AutoPyFunc</title><link>https://sidfeels.netlify.app/projects/autopyfunc/</link><pubDate>Fri, 16 Feb 2024 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/projects/autopyfunc/</guid><description>&lt;h1 id="autopyfunc-ai-powered-python-function-generator"&gt;AutoPyFunc: AI-Powered Python Function Generator&lt;/h1&gt;
&lt;p&gt;Hey guys so today I wanna share a different small script with you called - AutoPyFunc. It&amp;rsquo;s an AI-powered tool that generates Python functions from plain English descriptions. Pretty cool, right?&lt;/p&gt;
&lt;h2 id="so-what-is-autopyfunc"&gt;So what is AutoPyFunc?&lt;/h2&gt;
&lt;p&gt;AutoPyFunc is a Python script that uses AI models like GitHub Copilot&amp;rsquo;s GPT or OpenAI&amp;rsquo;s GPT to automatically create Python functions based on what you tell it to do. You just give it a description of what you want the function to do, and it spits out the Python code for you.&lt;/p&gt;</description></item><item><title>Vulnyzer</title><link>https://sidfeels.netlify.app/projects/vulnyzer/</link><pubDate>Wed, 20 Dec 2023 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/projects/vulnyzer/</guid><description>&lt;h1 id="vulnyzer"&gt;Vulnyzer&lt;/h1&gt;
&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;Hey there! Today, I want to share a cool project or (workaround maybe lol) I worked on called Vulnyzer. It&amp;rsquo;s a Streamlit-based app that lets you interact with different AI models like GPT-4 and GPT-3.5. The best part? I customized it to unlock some advanced features using GitHub Copilot. Let&amp;rsquo;s walk through how I did it step-by-step!&lt;/p&gt;
&lt;h2 id="key-features"&gt;Key Features&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Model Selection&lt;/strong&gt;: Pick between GPT-4, GPT-3.5, or input a custom model name.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Document Handling&lt;/strong&gt;: Upload and extract text from PDF documents.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Video Processing&lt;/strong&gt;: Get transcripts from YouTube videos.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dynamic Interaction&lt;/strong&gt;: Test interactions with AI models using custom prompts and settings.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-i-built-it"&gt;How I Built It&lt;/h2&gt;
&lt;h3 id="1-customizing-github-copilot-extension-to-get-auth-token"&gt;1. Customizing GitHub Copilot Extension to Get Auth Token&lt;/h3&gt;
&lt;p&gt;The first step was to tweak the GitHub Copilot extension to get the auth token. This token is like a golden key that lets you talk to the Copilot servers directly. Here’s how I did it:&lt;/p&gt;</description></item><item><title>AI Chatbot for Coal Mining Industry</title><link>https://sidfeels.netlify.app/projects/coal_mining_ai/</link><pubDate>Fri, 29 Sep 2023 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/projects/coal_mining_ai/</guid><description>&lt;h1 id="building-an-ai-chatbot-for-the-coal-mining-industry-with-petals-and-pinecone"&gt;Building an AI Chatbot for the Coal Mining Industry with PETALS and Pinecone&lt;/h1&gt;
&lt;p&gt;I recently worked on an exciting project to create an AI chatbot that can answer questions about laws, regulations, news, and other information relevant to the coal mining industry in India. The goal was to make it easy for anyone to access accurate, up-to-date information on this topic 24/7 through a simple chat interface on WhatsApp or a website.&lt;/p&gt;</description></item><item><title>How users are getting free access to GPT-4?!</title><link>https://sidfeels.netlify.app/posts/how-users-getting-free-access-gpt4/</link><pubDate>Wed, 07 Jun 2023 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/posts/how-users-getting-free-access-gpt4/</guid><description>&lt;h2 id="what-is-poe"&gt;What is Poe&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://poe.com/"&gt;Poe&lt;/a&gt; is this sick platform by Quora that lets you chat with some advanced AI bots like OpenAI&amp;rsquo;s ChatGPT and GPT-4, and even &lt;a href="https://www.anthropic.com/index/claude-2"&gt;Anthropic&amp;rsquo;s Claude&lt;/a&gt;. These are like, the most advance language models out there, and they can generate text on literally anything you throw at them.&lt;/p&gt;
&lt;p&gt;But here&amp;rsquo;s the catch: &lt;em&gt;Poe ain&amp;rsquo;t free, my dudes&lt;/em&gt;. You spend some cash every month to use all its features.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s why some smart cookies have found a way to get their hands on Poe&amp;rsquo;s features without spending anything at all lol. They&amp;rsquo;re doing by reverse engineering the API, they can make Poe think they&amp;rsquo;re paying customers and unlock all the goodies for free.&lt;/p&gt;</description></item></channel></rss>