Red Teaming LLMs/VLMs
Developing novel jailbreak techniques to expose vulnerabilities in LLMs, Vision-Language Models, and multimodal systems.
About
AI Safety Researcher @AIM Intelligence
Hey, I’m Siddhant. I’m 23.
Got into AI back in 2021 when diffusion models started picking up steam. Spent way too many hours tinkering with Stable Diffusion, fine-tuning models, and just seeing what breaks. What started as a hobby turned into an obsession.
Now that my electronics degree is done after 4 years, I’m going full-time into this. Deep dives into model behavior, adversarial attacks, safety research. The stuff that actually interests me.
My work sits at the intersection of adversarial machine learning and AI safety. I believe understanding how systems fail is the first step toward building systems that don’t.
Focus
Developing novel jailbreak techniques to expose vulnerabilities in LLMs, Vision-Language Models, and multimodal systems.
Studying how models internalize instructions, respond under pressure/conflict and where alignment breaks down.
Relatively new to this—using interpretability tools to build concrete understanding of the red teaming results I uncover.
Building automated workflows that systematically probe model vulnerabilities at scale.
Generating safety datasets and stress-testing guardrails to make them more robust against adversarial inputs.
Extending my knowledge toward embodied AI safety and understanding failure modes in robotic systems.
Why this work
I’m not afraid of superintelligence. I’m not afraid to live in a world among superintelligent systems. What I fear is a world where a small group controls that intelligence and I have no access to it.