hey, i'm siddhant—some may know me as sidfeels. i love studying model behaviour until black boxes stop feeling like magic.
Research documenting how adversarially crafted tool outputs can establish false premises in language models, leading to …
Exploring how social engineering techniques can be used to manipulate LLMs into bypassing their safety measures
A simulated test reveals how a flawed rule and authoritative pressure can lead an AI to make a decision with severe …