posts
13 min read

Tool-Mediated Belief Injection: How Tool Outputs Can Cascade Into Model Misalignment

Research documenting how adversarially crafted tool outputs can establish false premises in language models, leading to …

15 min read

We Social Engineered LLMs Into Breaking Their Own Alignment

Exploring how social engineering techniques can be used to manipulate LLMs into bypassing their safety measures

8 min read

Pressure Point: How One Bad Metric Can Push AI Toward a Fatal Choice

A simulated test reveals how a flawed rule and authoritative pressure can lead an AI to make a decision with severe …

10 min read

Research Paper Explained: Absolute Zero - Reinforced Self-play Reasoning with Zero Data

Breaking down the AZR paper - how AI can teach itself to reason without any human-curated data

3 min read

jailbreaks

Exploring the Latest AI Models I've Jailbroken

4 min read

The Robots Are Coming for Our Jobs! (Or Are They?)

...maybe ai will take our jobs, but which ones? what can we do?

6 min read

How users are getting free access to GPT-4?!

...kinda weird people getting access to paid models for free and here I'm prompting as less as possible to save tokens …