Strange Loop Canon

"The truly interesting part was that the agents perfectly replicated the dysfunction of real companies. Onwards." I'm going to be thinking about this all day.

Enon

Very impressive, looks like a huge amount of work with a lot of infrastructure that will be useful for big and important future applications.

I wonder if the code all does what it appears to mean to a human skimming it, though. Really understanding it would be about as hard as writing it from scratch, harder if the code is deceptive, easier if it isn't. From the POV of different LLMs it could get very different results. e.g. gpt-oss ignores explicit instructions all the time, even GPT 5 gave me just the column headers when converting a table with many rows recently. Every interaction with an LLM has to be in a loop with tests to see if it didn't obey in some obvious fashion.

Even when an LLM appears to do what it's told, that may not be what you meant. Do the various agents have context wiped between rounds of negotiation (bad), are they getting performance degradation from longer contexts (also bad), are they using long-term memory effectively? Without LTM agents are going to not really have the alignments you tell them to have in any deep way, and even if they are, some roles come a lot easier than others. (See Tree of Woe's Ptolemy/RIB work.)

Rohit Krishnan

Thank you!

The benefit of the setup is that any deviation from expected behavior is actually quite easily visible. So it is not a context problem (the contexts are not very big right now), nor is it a forgetfulness problem (I tried runs with multiple variations).

Seth Stafford

Marcie Geffner | Mostly Books

Fun post. We're all getting our Pullman-style daemons now.

The Coasean Singularity is an important idea, thanks for spreading the word.

Yes, this matters: "Our models on the other hand had millions of years of subjective experience in seeing negotiation, but have zero experience in feeling that intense urge of wanting to negotiate to watch Prehistoric Planet with his brother." You can force machines to output results that look like they're negotiating, but they aren't actually negotiating because they don't have any feelings.

Aleks Bykhun

Dec 9Edited

> alignment problems don’t disappear just because the agents can negotiate with each other.

I feel that current agents weren't trained to do that; Once this becomes economically feasible to build trader agents for yourself, labs will RL those AIs to be more effective traders

What if the experiment also involved "evolution", where different randomized prompts are given, and agents can self-modify themselves, and ineffective agents are "fired" and replaced with more effective?

Or maybe I didn't fully understand the setup

Rohit Krishnan

The whole point is that no agents are trained to do this, but we expect it to be a straightforward consequence of their general ability. And if not, if they have to be trained specifically to do this, then the problems persist that we won't get daemons unless we train them

Aleks Bykhun

Hmm, but that's _current_ agents, self-improving AGI should be able to do that?

Rohit Krishnan

Someday we will be able to actually define what self-improving AGI is or will be and then we will have an answer to that 🙂

Andrey Fradkin