research
13 min read

Tool-Mediated Belief Injection: How Tool Outputs Can Cascade Into Model Misalignment

Research documenting how adversarially crafted tool outputs can establish false premises in language models, leading to …

15 min read

We Social Engineered LLMs Into Breaking Their Own Alignment

Exploring how social engineering techniques can be used to manipulate LLMs into bypassing their safety measures

8 min read

Pressure Point: How One Bad Metric Can Push AI Toward a Fatal Choice

A simulated test reveals how a flawed rule and authoritative pressure can lead an AI to make a decision with severe …

10 min read

Research Paper Explained: Absolute Zero - Reinforced Self-play Reasoning with Zero Data

Breaking down the AZR paper - how AI can teach itself to reason without any human-curated data