I don’t know what I would have predicted in advance but in retrospect it’s not really surprising that LLMs trained entirely on human thought would approach the problem with human pathologies.
But still an interesting finding and doesn’t make it obvious how you’d work around that (or if you’d want to)
"The truly interesting part was that the agents perfectly replicated the dysfunction of real companies. Onwards." I'm going to be thinking about this all day.
Very impressive, looks like a huge amount of work with a lot of infrastructure that will be useful for big and important future applications.
I wonder if the code all does what it appears to mean to a human skimming it, though. Really understanding it would be about as hard as writing it from scratch, harder if the code is deceptive, easier if it isn't. From the POV of different LLMs it could get very different results. e.g. gpt-oss ignores explicit instructions all the time, even GPT 5 gave me just the column headers when converting a table with many rows recently. Every interaction with an LLM has to be in a loop with tests to see if it didn't obey in some obvious fashion.
Even when an LLM appears to do what it's told, that may not be what you meant. Do the various agents have context wiped between rounds of negotiation (bad), are they getting performance degradation from longer contexts (also bad), are they using long-term memory effectively? Without LTM agents are going to not really have the alignments you tell them to have in any deep way, and even if they are, some roles come a lot easier than others. (See Tree of Woe's Ptolemy/RIB work.)
The benefit of the setup is that any deviation from expected behavior is actually quite easily visible. So it is not a context problem (the contexts are not very big right now), nor is it a forgetfulness problem (I tried runs with multiple variations).
Yes, this matters: "Our models on the other hand had millions of years of subjective experience in seeing negotiation, but have zero experience in feeling that intense urge of wanting to negotiate to watch Prehistoric Planet with his brother." You can force machines to output results that look like they're negotiating, but they aren't actually negotiating because they don't have any feelings.
> alignment problems don’t disappear just because the agents can negotiate with each other.
I feel that current agents weren't trained to do that; Once this becomes economically feasible to build trader agents for yourself, labs will RL those AIs to be more effective traders
What if the experiment also involved "evolution", where different randomized prompts are given, and agents can self-modify themselves, and ineffective agents are "fired" and replaced with more effective?
The whole point is that no agents are trained to do this, but we expect it to be a straightforward consequence of their general ability. And if not, if they have to be trained specifically to do this, then the problems persist that we won't get daemons unless we train them
This is fantastic engagement with our work! The failure of agents to do many things out-of-the-box in markets is one of the reasons we'll need benchmarks for strategic situations and for human-ai collaboration (human as principal and AI as agent) in the market. I am also particularly interested in vending bench and similar exercises.
What these experiments surface is that lowering transaction costs doesn’t create coordination by itself. Even with capable agents and shared information, meaning doesn’t bind behavior without constraint. Without mechanisms that preserve semantic fidelity, terms like value, efficiency, or trade remain legible but fail to actually govern action.
I don’t know what I would have predicted in advance but in retrospect it’s not really surprising that LLMs trained entirely on human thought would approach the problem with human pathologies.
But still an interesting finding and doesn’t make it obvious how you’d work around that (or if you’d want to)
"The truly interesting part was that the agents perfectly replicated the dysfunction of real companies. Onwards." I'm going to be thinking about this all day.
Very impressive, looks like a huge amount of work with a lot of infrastructure that will be useful for big and important future applications.
I wonder if the code all does what it appears to mean to a human skimming it, though. Really understanding it would be about as hard as writing it from scratch, harder if the code is deceptive, easier if it isn't. From the POV of different LLMs it could get very different results. e.g. gpt-oss ignores explicit instructions all the time, even GPT 5 gave me just the column headers when converting a table with many rows recently. Every interaction with an LLM has to be in a loop with tests to see if it didn't obey in some obvious fashion.
Even when an LLM appears to do what it's told, that may not be what you meant. Do the various agents have context wiped between rounds of negotiation (bad), are they getting performance degradation from longer contexts (also bad), are they using long-term memory effectively? Without LTM agents are going to not really have the alignments you tell them to have in any deep way, and even if they are, some roles come a lot easier than others. (See Tree of Woe's Ptolemy/RIB work.)
Thank you!
The benefit of the setup is that any deviation from expected behavior is actually quite easily visible. So it is not a context problem (the contexts are not very big right now), nor is it a forgetfulness problem (I tried runs with multiple variations).
Fun post. We're all getting our Pullman-style daemons now.
The Coasean Singularity is an important idea, thanks for spreading the word.
Yes, this matters: "Our models on the other hand had millions of years of subjective experience in seeing negotiation, but have zero experience in feeling that intense urge of wanting to negotiate to watch Prehistoric Planet with his brother." You can force machines to output results that look like they're negotiating, but they aren't actually negotiating because they don't have any feelings.
> alignment problems don’t disappear just because the agents can negotiate with each other.
I feel that current agents weren't trained to do that; Once this becomes economically feasible to build trader agents for yourself, labs will RL those AIs to be more effective traders
What if the experiment also involved "evolution", where different randomized prompts are given, and agents can self-modify themselves, and ineffective agents are "fired" and replaced with more effective?
Or maybe I didn't fully understand the setup
The whole point is that no agents are trained to do this, but we expect it to be a straightforward consequence of their general ability. And if not, if they have to be trained specifically to do this, then the problems persist that we won't get daemons unless we train them
Hmm, but that's _current_ agents, self-improving AGI should be able to do that?
Someday we will be able to actually define what self-improving AGI is or will be and then we will have an answer to that 🙂
This is fantastic engagement with our work! The failure of agents to do many things out-of-the-box in markets is one of the reasons we'll need benchmarks for strategic situations and for human-ai collaboration (human as principal and AI as agent) in the market. I am also particularly interested in vending bench and similar exercises.
What these experiments surface is that lowering transaction costs doesn’t create coordination by itself. Even with capable agents and shared information, meaning doesn’t bind behavior without constraint. Without mechanisms that preserve semantic fidelity, terms like value, efficiency, or trade remain legible but fail to actually govern action.