Physics in Video Language Models
In this post, we explore the physics behind text to video language models and how they can be used to generate realistic videos from text prompts. Read more
A natural next step is to create systems of multiple agents working together. This is the point at which, several folks mistake systems that are really just modular single agent systems as multi agent.
Why care about this terminology? Because lack of clear distinction affects how we think about these systems, what we expect from them, and how we design them.
Let’s clarify what does NOT automatically make a system multi agentic.
Many so-called multi agent systems are simply implementing good software engineering; breaking a complex process into smaller, more manageable subtasks. For example, a system might have:
Here, we would call an LLM for each module. But this really doesn’t make it multi agent. This is just a single agent with moduarity in instructions and tools available to the LLM. Notice how in such systems the conversation context is shared across the different modules.
Another common pattern is designating different LLM instances to different expertise.
Given our definition of agent, it would be reasonable to think this constitutes a multi agent system. Are they though? What happens if I replace each agent by a function call that completes the same task? Would they still be called multi agent?
The difference here is that of being stateful i.e., maintaining context across multiple interactions. If your agent can be replaced by a tool call then is it really an agent?
Some systems use different underlying LLMs based on the task requirements. For example, consider a system that uses
Would it be fair to call this system multi agentic? Probably not. The use of different LLMs merely amounts to leveraging different capabilities in these models. What if we get a new model that is single handedly the best across all tasks? So now when you use this model across the board, your multi agent system all of a sudden gets downgraded to a single agent system.
Running multiple instances of similar processes and combining their outputs (like having multiple search summarizers and then merging them) is an ensemble approach. But are they multi agentic? This pattern of design is again a software design choice. If I replace the parallel calls with sequential process and perform the ensemble, the output is really not that much different (except for the latency).
So if these common patterns don’t constitute multi agency, what does? I have laid down the hints already. The key differentiating factor is persistent private state that affects cross-agent interactions.
For a system to exhibit interesting multi agent behaviors, each agent needs:
Let me illustrate this with a concrete example:
Consider a system with a coding agent and a review agent (written in pseudo-code):
This is NOT a Multi Agent System:
function coding(requirement):
code = coding_agent(requirement)
review = review_agent(code, requirement)
if review is negative:
code = coding_agent(requirement, review)
review = review_agent(code, requirement)
return code, review
In this approach, the coding_agent (review_agent) starts fresh each time, with no memory of previous code (review). The agents here can be replaced with a function call and the system will remain the same.
This would be a Multi Agent System:
class CodingAgent:
def __init__(self):
self.code_history = []
self.observed_patterns = {}
def code(self, requirement, review=None):
# Code based on both requirement, current review, history of previous code
# Update internal observations about recurring reviews
# Return code
class ReviewAgent:
def __init__(self):
self.review_history = []
self.observed_patterns = {}
def review(self, code, requirement):
# Review based on both current code and history of previous reviews
# Update internal observations about recurring issues
# Return review
function coding(requirement):
coding_agent = CodingAgent()
review_agent = ReviewAgent()
code = coding_agent.code(requirement)
review = review_agent.review(code, requirement)
if review is negative:
code = coding_agent.code(requirement, review)
review = review_agent.review(code, requirement)
return code, review
In the multi agent approach, the reviewer, for instance, maintains its own persistent state across multiple code reviews. It might identify patterns in the coder’s behavior that inform future reviews, develop opinions about certain coding practices, and have a relationship with the coder that evolves over time.
This fundamental distinction that, agents maintain their own internal states that persist across interactions is what creates multi agent scenarios. This leads to several interesting properties:
Persistence creates the possibility for behaviors that couldn’t arise in modular systems. For example, in negotiations between our multi agents, behaviors like reciprocity, grudges, or trust can naturally develop.
Beyond persistent state, another level of multi agency comes from asynchronous communication patterns. In current systems, agent interactions are synchronous - one agent calls another and waits for a response. The next step in multi agent systems will include:
This asynchronous pattern creates much more complex and interesting dynamics, similar to how humans interact in social environments.
You might wonder why I’m concerned about this terminology distinction. There are a few reasons:
There’s nothing wrong with modular single-agent systems - they’re often the right solution for almost all problems that AI are being tasked to solve today. But by conflating agency, we risk missing out on the interesting phenomenas that can emerge from systems of multiple agents with persistent private states interacting over time.
As we continue to develop LLM-based systems, I hope we’ll be more precise in our terminology and more intentional in our design choices. There’s a fascinating space of possibilities in multi agent systems that we’ve only begun to explore, one where the complexity of the whole exceeds the sum of its parts.