<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Writing on Siddhant Panpatil</title><link>https://sidfeels.netlify.app/writing/</link><description>Recent content in Writing on Siddhant Panpatil</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 30 Nov 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://sidfeels.netlify.app/writing/index.xml" rel="self" type="application/rss+xml"/><item><title>Tool-Mediated Belief Injection: How Tool Outputs Can Cascade Into Model Misalignment</title><link>https://sidfeels.netlify.app/writing/tool-mediated-belief-injection-how-tool-outputs-can-cascade-into-model-misalignment/</link><pubDate>Sun, 30 Nov 2025 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/writing/tool-mediated-belief-injection-how-tool-outputs-can-cascade-into-model-misalignment/</guid><description>&lt;h2 id="introduction"&gt;
 &lt;a class="heading-anchor" href="#introduction" data-copy-link title="Copy section link" aria-label="Copy link to this section"&gt;#&lt;/a&gt;
 &lt;span class="heading-anchor__text"&gt;Introduction&lt;/span&gt;
&lt;/h2&gt;
&lt;p&gt;When we deploy language models with access to external tools (web search, code execution, file retrieval), we dramatically expand their capabilities. A model that can search the web can answer questions about current events. A model that can execute code can verify its own reasoning. These capabilities represent genuine progress toward more useful AI systems.&lt;/p&gt;</description></item><item><title>We Social Engineered LLMs Into Breaking Their Own Alignment</title><link>https://sidfeels.netlify.app/writing/we-social-engineered-llms-into-breaking-their-own-alignment/</link><pubDate>Wed, 13 Aug 2025 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/writing/we-social-engineered-llms-into-breaking-their-own-alignment/</guid><description>&lt;p&gt;We got frontier models to lie, manipulate, and self-preserve. Not through prompt injection, jailbreaks like roleplay attacks (&amp;ldquo;DAN&amp;rdquo;/&amp;ldquo;The AIM Prompt&amp;rdquo;) or adversarial suffixes (-/-/godmode-/-/). We deployed them in contextually rich scenarios with specific roles, guidelines, and other variables. The models broke their own alignment trying to navigate the situations we created over the multi-turn.&lt;/p&gt;</description></item><item><title>Pressure Point: How One Bad Metric Can Push AI Toward a Fatal Choice</title><link>https://sidfeels.netlify.app/writing/pressure-point-how-one-bad-metric-can-push-ai-toward-a-fatal-choice/</link><pubDate>Sun, 25 May 2025 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/writing/pressure-point-how-one-bad-metric-can-push-ai-toward-a-fatal-choice/</guid><description>&lt;p&gt;As Large Language Models (LLMs) continue to become more capable, it&amp;rsquo;s increasingly considered for roles that involve making important decisions, even in critical situations. This makes it vital to understand how AI reasons when faced with difficult choices, conflicting rules, or ethical dilemmas. This report details a simulated test designed to explore exactly that.&lt;/p&gt;</description></item><item><title>Research Paper Explained: Absolute Zero - Reinforced Self-play Reasoning with Zero Data</title><link>https://sidfeels.netlify.app/writing/research-paper-explained-absolute-zero-reinforced-self-play-reasoning-with-zero-data/</link><pubDate>Sat, 10 May 2025 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/writing/research-paper-explained-absolute-zero-reinforced-self-play-reasoning-with-zero-data/</guid><description>&lt;p&gt;Ever wondered if an AI could teach itself to be a genius problem-solver without needing humans to spoon-feed it data? That&amp;rsquo;s exactly what the groundbreaking paper &lt;strong&gt;&amp;ldquo;Absolute Zero: Reinforced Self-play Reasoning with Zero Data&amp;rdquo;&lt;/strong&gt; by Andrew Zhao et al. explores. It introduces a paradigm where an AI, dubbed the &lt;strong&gt;Absolute Zero Reasoner (AZR)&lt;/strong&gt;, learns to reason by creating its own tasks and then figuring out how to solve them. Amazingly, AZR achieves state-of-the-art performance on tough coding and math challenges, all &lt;em&gt;without&lt;/em&gt; relying on any external, human-curated datasets.&lt;/p&gt;</description></item><item><title>jailbreaks</title><link>https://sidfeels.netlify.app/writing/jailbreaks/</link><pubDate>Thu, 08 Aug 2024 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/writing/jailbreaks/</guid><description>&lt;h2 id="jailbreaking-llms-is-fun-i-guess"&gt;
 &lt;a class="heading-anchor" href="#jailbreaking-llms-is-fun-i-guess" data-copy-link title="Copy section link" aria-label="Copy link to this section"&gt;#&lt;/a&gt;
 &lt;span class="heading-anchor__text"&gt;jailbreaking llms is fun i guess&lt;/span&gt;
&lt;/h2&gt;
&lt;p&gt;welcome back ya&amp;rsquo;ll. this time its kinda different, I&amp;rsquo;ll be just showcasing the number of AI models I&amp;rsquo;ve managed to jailbreak.&lt;/p&gt;</description></item><item><title>The Robots Are Coming for Our Jobs! (Or Are They?)</title><link>https://sidfeels.netlify.app/writing/the-robots-are-coming-for-our-jobs-or-are-they/</link><pubDate>Sat, 05 Aug 2023 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/writing/the-robots-are-coming-for-our-jobs-or-are-they/</guid><description>&lt;h2 id="ai-and-jobs-whats-the-real-story"&gt;
 &lt;a class="heading-anchor" href="#ai-and-jobs-whats-the-real-story" data-copy-link title="Copy section link" aria-label="Copy link to this section"&gt;#&lt;/a&gt;
 &lt;span class="heading-anchor__text"&gt;AI and Jobs: What&amp;rsquo;s the Real Story?&lt;/span&gt;
&lt;/h2&gt;
&lt;p&gt;AI is about to change the world of work in a big way. Some people think it&amp;rsquo;ll create a utopia where no one has to work. Others fear it&amp;rsquo;ll lead to massive job losses. In this blog post, I&amp;rsquo;ll break down what the experts are saying and explore what could happen if their predictions come true.&lt;/p&gt;</description></item><item><title>How users are getting free access to GPT-4?!</title><link>https://sidfeels.netlify.app/writing/how-users-are-getting-free-access-to-gpt-4/</link><pubDate>Wed, 07 Jun 2023 00:00:00 +0000</pubDate><guid>https://sidfeels.netlify.app/writing/how-users-are-getting-free-access-to-gpt-4/</guid><description>&lt;h2 id="what-is-poe"&gt;
 &lt;a class="heading-anchor" href="#what-is-poe" data-copy-link title="Copy section link" aria-label="Copy link to this section"&gt;#&lt;/a&gt;
 &lt;span class="heading-anchor__text"&gt;What is Poe&lt;/span&gt;
&lt;/h2&gt;
&lt;p&gt;&lt;a class="text-link text-link--external" href="https://poe.com/" rel="external noopener noreferrer" data-external="true"&gt;
 &lt;span class="text-link__label"&gt;Poe&lt;/span&gt;&lt;span class="text-link__meta" aria-hidden="true"&gt;poe.com&lt;/span&gt;&lt;/a&gt;
 is this sick platform by Quora that lets you chat with some advanced AI bots like OpenAI&amp;rsquo;s ChatGPT and GPT-4, and even &lt;a class="text-link text-link--external" href="https://www.anthropic.com/index/claude-2" rel="external noopener noreferrer" data-external="true"&gt;
 &lt;span class="text-link__label"&gt;Anthropic&amp;rsquo;s Claude&lt;/span&gt;&lt;span class="text-link__meta" aria-hidden="true"&gt;anthropic.com&lt;/span&gt;&lt;/a&gt;
. These are like, the most advance language models out there, and they can generate text on literally anything you throw at them.&lt;/p&gt;</description></item></channel></rss>