Physics in Video Language Models
In this post, we explore the physics behind text to video language models and how they can be used to generate realistic videos from text prompts. Read more
Our research focuses on two critical components of transformer layers in LLM: Multi-Head Attention (MHA) and Multilayer Perceptrons (MLPs). By developing mathematical tools to analyze these components, we’ve uncovered fundamental properties that enhance our understanding:
One of our most surprising findings is that our geometric understanding reveals a simple attack that can circumvent existing AI safety measures. We found that prompts—including random words—that increase the “intrinsic dimension” of the input can sometimes cause an LLM to generate toxic content, despite being trained to avoid such behavior using techniques like Reinforcement Learning from Human Feedback (RLHF).
This discovery highlights a significant challenge in AI safety. Current methods for making LLMs safer may not be as robust as previously thought when faced with longer inputs, be it via retrieval augmented generation or increased context via instructions and examples for agentic behaviors. A New Approach to Toxicity Detection
On a positive note, our geometric insights led us to develop a highly effective method for detecting and intervening when this toxicity bypass occurs. We created a set of “spline” features that capture the geometric properties of how an LLM processes text.
Using these features, we achieved remarkable results, outperforming existing state-of-the-art toxicity detection systems by a significant margin. Our approach excels even with limited training data and can identify toxicity embedded even in a large context of input.
While our study focused on toxicity detection and generation, we’re excited about the broader potential of this geometric approach to unlock many more insights into LLMs. As AI systems become increasingly integrated into our daily lives, understanding their inner workings is more crucial than ever.
Our work is a significant step towards making AI systems more transparent, reliable, and safe. As we continue to explore the capabilities of LLMs, we’re committed to peeling back the complexity and revealing the simple mathematical structures that define these systems.
We look forward to seeing how other researchers and practitioners will build upon our findings. The journey to fully understand and harness the power of LLMs is far from over, but we believe our geometric approach provides a valuable new path forward.
Stay tuned for more updates on our research and discoveries as we continue to push the boundaries of AI understanding and safety.
—-
If you’re interested in learning more about our work, you can read our paper here: Openreview PDF
Attending ICML 2024? We’d love to chat! Feel free to reach out to us if you’re interested in discussing our research or exploring potential collaborations. Our poster presentation is scheduled for Wednesday, July 24th, from 4:30 AM (6:00 AM PDT). It will be located in Hall C, section 4-9, at booth number 705.