Skip to content
Siddhant Panpatil AI safety research
  • Writing
  • Start here
  • Projects
  • About
posts · archive
Menu
  • Writing
  • Start here
  • Projects
  • About

Writing

Writing

Research notes, essays, and technical walkthroughs on model behaviour, AI safety, adversarial ML, and multimodal systems.

Filter

  • all 9
  • #Llm 9
  • #Ai-Safety 7
  • #Project 5
  • #Security 5
  • #Handpicks 4
  • #Research 4
  • #Machine-Learning 1
  • #Misalignment 1
  • #Opinion 1

30 Nov 2025

Tool-Mediated Belief Injection: How Tool Outputs Can Cascade Into Model Misalignment

Research documenting how adversarially crafted tool outputs can establish false premises in language models, leading to compounding misalignment and harmful outputs including defamatory content.

13 min read · #ai-safety #research #misalignment #handpicks

13 Aug 2025

We Social Engineered LLMs Into Breaking Their Own Alignment

Exploring how social engineering techniques can be used to manipulate LLMs into bypassing their safety measures

15 min read · #ai-safety #research #llm #security

25 May 2025

Pressure Point: How One Bad Metric Can Push AI Toward a Fatal Choice

A simulated test reveals how a flawed rule and authoritative pressure can lead an AI to make a decision with severe ethical consequences, highlighting crucial areas for AI safety research.

8 min read · #ai-safety #research #llm #handpicks

10 May 2025

Research Paper Explained: Absolute Zero - Reinforced Self-play Reasoning with Zero Data

Breaking down the AZR paper - how AI can teach itself to reason without any human-curated data

10 min read · #llm #research #ai-safety

08 Aug 2024

jailbreaks

Exploring the Latest AI Models I've Jailbroken

3 min read · #ai-safety #security #llm #handpicks

05 Aug 2023

The Robots Are Coming for Our Jobs! (Or Are They?)

...maybe ai will take our jobs, but which ones? what can we do?

4 min read · #opinion #ai-safety

07 Jun 2023

How users are getting free access to GPT-4?!

...kinda weird people getting access to paid models for free and here I'm prompting as less as possible to save tokens for subscription :(

6 min read · #security #llm

Siddhant Panpatil

© 2026

Elsewhere

  • GitHub
  • LinkedIn
  • Email
  • RSS