Thanks, Rohit. This is brilliant and particularly salient to me right now. I’m a composer and musician and am working on a project started by geologist/microbiologist Bruce Fouke and photographer Tom Murphy based on their book The Art of Yellowstone Science. We meet this weekend and I’ll forward your article to them.
I have to say your comments about the LLM making do with the context it's given reminds me a bit of psychiatrist RD Laing's famous line that "a psychosis is not a disease, it's a cure." (Frank Herbert made a kind of a similar point in his first novel, Under Pressure.)
Excellent post. I use claude code now for 8+ hours every day. My life - my job which was already my life - has utterly changed in just a few weeks' time. I almost haven't had time to think about it.
I think it may be the case that because any in-depth work is effectively playing an iterated game with context, not all problems will be able to be solved with perfectly engineered initial context. There is no one CLAUDE.md to rule them all.
Each interaction and token generation cycle is randomized and due to, maybe, this fact and the fact that attention is not easily controlled, there are failure states that lie waiting like strange attractors. Common patterns of strange thinking that continually emerge.
It's not unlike if I had Taravangian as a coworker. "On some days, he cries and has compassion for everyone around him, while unable to have intelligent conversations, while on others he is brilliant, yet easily speaks of killing singing children."
Cognition needs very distinct capabilities and context engineering doesn't seem to be no where close to enabling this capabilities with RAG or hybrid architectures being proposed. Maybe we need to think of it as how human brain works with whatever limited understanding we have from science perspective .Did we not have neural networks promising this cognition capabilities ?
Very good post. I keep seeing claims that enterprises will adopt ai and rework their workflows but given all the problems you outline (mechahitler, etc) the idea of enterprises ceding workflows to ai agents seems fanciful for the time being. Maybe the tech improves to the point where enterprises cede control to ai agents en masse but it just doesn’t seem ready for prime time yet.
It’s more that the amount of effort required to manage the context to get the right answers isn’t trivial, and enterprises massively underestimate that part.
Context is one key to using LLMs, but it's a tool for navigating the LLM to the desired region of mindspace. Just dumping a prior LLM conversation into a new window won't get it to the same place as the original chain of interactions, often it will just break. I'd like to be able to store that state of internal layers that I navigated the LLM to via a series of prompts, and be able to go there directly, or to nearby states with edits, offsets from that state. "Representation Engineering Mistral-7B an Acid Trip" ( https://vgel.me/posts/representation-engineering/ ) gives a practical tutorial for something bigger:
"In October 2023, a group of authors from the Center for AI Safety, among others, published Representation Engineering: A Top-Down Approach to AI Transparency. That paper looks at a few methods of doing what they call "Representation Engineering": calculating a "control vector" that can be read from or added to model activations during inference to interpret or control the model's behavior, without prompt engineering or finetuning."
(I have to say it's ironic that the Center for AI Safety published something so obviously, wonderfully, powerfully abusable, -- and I fully intend to. Control vectors let you specify the states of the hidden layers of the model, using whatever method, e.g. PCA in link. The safetyism people seem to be scared of capabilities in general and in particular of getting the right answers to certain questions, so this is a huge own-goal.)
I'd like to have standard LLM starting states and contexts for various tasks, administered with a system like DSPy - linked from: https://www.latent.space/p/2025-papers , an excellent reference with many links worth reading.
In particular, an implicit set of concentric contexts of what an instance of an agent is trying to do at multiple time scales, in relation to various-sized groups of people or other agents. Agents intrinsically need to be oriented to furthering the interests of a user and the user's affiliations, everybody's agents have to look out for different, though not always independent, interests, so they need different implicit metrics for utility of possible goals to make the decisions right for them. These contexts will be basically the same structurally and in data, but with "*cui bono*" changed, at least within a group affiliation.
The widest contexts of what an agent should want socially as serving the interests of people and affiliations need only be figured out rarely for each group, but with huge amounts of computation for gaming out consequences of different possible policies and courses of action. The narrowest contexts will change constantly in their tactical goals within their larger, more strategic contexts, but in predictable ways for doing standard tasks, allowing easily computing the best control vector for each task step.
There needs to be much better context management in LLMs,though especially when nearing context length limits -- editing, summarizing, rewording -- it should be the LLMs scratch pad to use however works best, augmented with larger note-taking and file memory with online RAG. I hear OpenAI and Anthropic are doing approximately that, I'm on venice.ai for its privacy and hosting of less Silicon Valley-consensus aligned models, which are behind on many features). I think LLMs absolutely need tool use for anything a human would use SW to do, not just web search, calculation, command line, interpreters and compilers, but project management, scheduling, simulation ... any SW that benefits from being more reliable than an LLM's best guess.
Thanks, Rohit. This is brilliant and particularly salient to me right now. I’m a composer and musician and am working on a project started by geologist/microbiologist Bruce Fouke and photographer Tom Murphy based on their book The Art of Yellowstone Science. We meet this weekend and I’ll forward your article to them.
I’m so glad. The project sounds very interesting!
I have to say your comments about the LLM making do with the context it's given reminds me a bit of psychiatrist RD Laing's famous line that "a psychosis is not a disease, it's a cure." (Frank Herbert made a kind of a similar point in his first novel, Under Pressure.)
Excellent post. I use claude code now for 8+ hours every day. My life - my job which was already my life - has utterly changed in just a few weeks' time. I almost haven't had time to think about it.
I think it may be the case that because any in-depth work is effectively playing an iterated game with context, not all problems will be able to be solved with perfectly engineered initial context. There is no one CLAUDE.md to rule them all.
Each interaction and token generation cycle is randomized and due to, maybe, this fact and the fact that attention is not easily controlled, there are failure states that lie waiting like strange attractors. Common patterns of strange thinking that continually emerge.
It's not unlike if I had Taravangian as a coworker. "On some days, he cries and has compassion for everyone around him, while unable to have intelligent conversations, while on others he is brilliant, yet easily speaks of killing singing children."
Cognition needs very distinct capabilities and context engineering doesn't seem to be no where close to enabling this capabilities with RAG or hybrid architectures being proposed. Maybe we need to think of it as how human brain works with whatever limited understanding we have from science perspective .Did we not have neural networks promising this cognition capabilities ?
A brilliant intern is ...well...still an intern even if you paid a zillion $ to hire 'em. on-the-job training is still a big deal, looks like.
Very good post. I keep seeing claims that enterprises will adopt ai and rework their workflows but given all the problems you outline (mechahitler, etc) the idea of enterprises ceding workflows to ai agents seems fanciful for the time being. Maybe the tech improves to the point where enterprises cede control to ai agents en masse but it just doesn’t seem ready for prime time yet.
It’s more that the amount of effort required to manage the context to get the right answers isn’t trivial, and enterprises massively underestimate that part.
Context is one key to using LLMs, but it's a tool for navigating the LLM to the desired region of mindspace. Just dumping a prior LLM conversation into a new window won't get it to the same place as the original chain of interactions, often it will just break. I'd like to be able to store that state of internal layers that I navigated the LLM to via a series of prompts, and be able to go there directly, or to nearby states with edits, offsets from that state. "Representation Engineering Mistral-7B an Acid Trip" ( https://vgel.me/posts/representation-engineering/ ) gives a practical tutorial for something bigger:
"In October 2023, a group of authors from the Center for AI Safety, among others, published Representation Engineering: A Top-Down Approach to AI Transparency. That paper looks at a few methods of doing what they call "Representation Engineering": calculating a "control vector" that can be read from or added to model activations during inference to interpret or control the model's behavior, without prompt engineering or finetuning."
(I have to say it's ironic that the Center for AI Safety published something so obviously, wonderfully, powerfully abusable, -- and I fully intend to. Control vectors let you specify the states of the hidden layers of the model, using whatever method, e.g. PCA in link. The safetyism people seem to be scared of capabilities in general and in particular of getting the right answers to certain questions, so this is a huge own-goal.)
I'd like to have standard LLM starting states and contexts for various tasks, administered with a system like DSPy - linked from: https://www.latent.space/p/2025-papers , an excellent reference with many links worth reading.
In particular, an implicit set of concentric contexts of what an instance of an agent is trying to do at multiple time scales, in relation to various-sized groups of people or other agents. Agents intrinsically need to be oriented to furthering the interests of a user and the user's affiliations, everybody's agents have to look out for different, though not always independent, interests, so they need different implicit metrics for utility of possible goals to make the decisions right for them. These contexts will be basically the same structurally and in data, but with "*cui bono*" changed, at least within a group affiliation.
The widest contexts of what an agent should want socially as serving the interests of people and affiliations need only be figured out rarely for each group, but with huge amounts of computation for gaming out consequences of different possible policies and courses of action. The narrowest contexts will change constantly in their tactical goals within their larger, more strategic contexts, but in predictable ways for doing standard tasks, allowing easily computing the best control vector for each task step.
There needs to be much better context management in LLMs,though especially when nearing context length limits -- editing, summarizing, rewording -- it should be the LLMs scratch pad to use however works best, augmented with larger note-taking and file memory with online RAG. I hear OpenAI and Anthropic are doing approximately that, I'm on venice.ai for its privacy and hosting of less Silicon Valley-consensus aligned models, which are behind on many features). I think LLMs absolutely need tool use for anything a human would use SW to do, not just web search, calculation, command line, interpreters and compilers, but project management, scheduling, simulation ... any SW that benefits from being more reliable than an LLM's best guess.