By focusing on concepts such as input space partitioning, intrinsic dimension, and the density of self-attention graphs, we’ve been able to draw connections between the prompt and its impact on reasoning abilities. Our findings not only provide deeper insights into how LLMs function but also suggest new avenues for enhancing their performance.
In the following, we delve into our key findings, exploring how the geometry of LLMs shapes their reasoning capabilities and what this means for the future of AI development.
Key Findings
We establish a connection between an LLM’s expressive power and the density of its self-attention graphs. This relationship is rooted in how MLPs partition their input space, creating regions where different affine transformations occur.
The intrinsic dimension of inputs to the multi-layer perceptron (MLP) blocks in LLMs plays a crucial role in their expressive capacity. Higher intrinsic dimension correlates with greater expressive power.
Increasing either the context length or the number of attention heads in an LLM can lead to higher intrinsic dimension of the MLP’s input.
Our experiments on the GSM8K-Zero dataset demonstrate that increases in intrinsic dimension, particularly in the final layers of an LLM, correlate with improved reasoning performance.
Implications
We therefore posit that reasoning capabilities are tied to the expressive power induced by the MLPs. As we showed in this work, methods such as many-shot and chain-of-thought are exploiting such phenomena.
This research provides a new framework for understanding and potentially enhancing LLM reasoning abilities. By focusing on geometric properties like intrinsic dimension and input space partitioning, we may be able to design more efficient and capable language models without necessarily increasing model size.
Future Directions
While our work reveals intriguing correlations between geometric properties and reasoning capabilities, many questions remain. How do these insights relate to LLM generalization? Can we leverage this understanding to create smaller models with comparable reasoning abilities to larger ones?
As we continue to explore these questions, we hope our geometric perspective will contribute to the development of more powerful and efficient language models, pushing the boundaries of what’s possible in the realm of LLMs.
You May Also Enjoy
4 minute read
When we think about Multi-Layer Perceptrons (MLPs), we often visualize them as interconnected neurons processing information. However, there’s an elegant alternative perspective - viewing MLPs as hashing functions that partition input space and mapping functions on these partitions. Read more
3 minute read
At Tenyx, we’ve spent countless hours peering into the intricate workings of Large Language Models (LLMs). Today, we’re excited to share our research, in collaboration with Brown University, that sheds light on the geometric structures and transformations governing these models. Our work provides new insights into how LLMs process their inputs and the implications for AI safety in applications driven by LLMs. Read more