Tool-Mediated Belief Injection: How Tool Outputs Can Cascade Into Model Misalignment
Research documenting how adversarially crafted tool outputs can establish false premises in language models, leading to compounding misalignment and harmful outputs including defamatory content.