Thanks, Rohit. This is brilliant and particularly salient to me right now. I’m a composer and musician and am working on a project started by geologist/microbiologist Bruce Fouke and photographer Tom Murphy based on their book The Art of Yellowstone Science. We meet this weekend and I’ll forward your article to them.
Very good post. I keep seeing claims that enterprises will adopt ai and rework their workflows but given all the problems you outline (mechahitler, etc) the idea of enterprises ceding workflows to ai agents seems fanciful for the time being. Maybe the tech improves to the point where enterprises cede control to ai agents en masse but it just doesn’t seem ready for prime time yet.
It’s more that the amount of effort required to manage the context to get the right answers isn’t trivial, and enterprises massively underestimate that part.
Context is one key to using LLMs, but it's a tool for navigating the LLM to the desired region of mindspace. Just dumping a prior LLM conversation into a new window won't get it to the same place as the original chain of interactions, often it will just break. I'd like to be able to store that state of internal layers that I navigated the LLM to via a series of prompts, and be able to go there directly, or to nearby states with edits, offsets from that state. "Representation Engineering Mistral-7B an Acid Trip" ( https://vgel.me/posts/representation-engineering/ ) gives a practical tutorial for something bigger:
"In October 2023, a group of authors from the Center for AI Safety, among others, published Representation Engineering: A Top-Down Approach to AI Transparency. That paper looks at a few methods of doing what they call "Representation Engineering": calculating a "control vector" that can be read from or added to model activations during inference to interpret or control the model's behavior, without prompt engineering or finetuning."
(I have to say it's ironic that the Center for AI Safety published something so obviously, wonderfully, powerfully abusable, -- and I fully intend to. Control vectors let you specify the states of the hidden layers of the model, using whatever method, e.g. PCA in link. The safetyism people seem to be scared of capabilities in general and in particular of getting the right answers to certain questions, so this is a huge own-goal.)
I'd like to have standard LLM starting states and contexts for various tasks, administered with a system like DSPy - linked from: https://www.latent.space/p/2025-papers , an excellent reference with many links worth reading.
In particular, an implicit set of concentric contexts of what an instance of an agent is trying to do at multiple time scales, in relation to various-sized groups of people or other agents. Agents intrinsically need to be oriented to furthering the interests of a user and the user's affiliations, everybody's agents have to look out for different, though not always independent, interests, so they need different implicit metrics for utility of possible goals to make the decisions right for them. These contexts will be basically the same structurally and in data, but with "*cui bono*" changed, at least within a group affiliation.
The widest contexts of what an agent should want socially as serving the interests of people and affiliations need only be figured out rarely for each group, but with huge amounts of computation for gaming out consequences of different possible policies and courses of action. The narrowest contexts will change constantly in their tactical goals within their larger, more strategic contexts, but in predictable ways for doing standard tasks, allowing easily computing the best control vector for each task step.
There needs to be much better context management in LLMs,though especially when nearing context length limits -- editing, summarizing, rewording -- it should be the LLMs scratch pad to use however works best, augmented with larger note-taking and file memory with online RAG. I hear OpenAI and Anthropic are doing approximately that, I'm on venice.ai for its privacy and hosting of less Silicon Valley-consensus aligned models, which are behind on many features). I think LLMs absolutely need tool use for anything a human would use SW to do, not just web search, calculation, command line, interpreters and compilers, but project management, scheduling, simulation ... any SW that benefits from being more reliable than an LLM's best guess.
Thanks, Rohit. This is brilliant and particularly salient to me right now. I’m a composer and musician and am working on a project started by geologist/microbiologist Bruce Fouke and photographer Tom Murphy based on their book The Art of Yellowstone Science. We meet this weekend and I’ll forward your article to them.
I’m so glad. The project sounds very interesting!
A brilliant intern is ...well...still an intern even if you paid a zillion $ to hire 'em. on-the-job training is still a big deal, looks like.
Very good post. I keep seeing claims that enterprises will adopt ai and rework their workflows but given all the problems you outline (mechahitler, etc) the idea of enterprises ceding workflows to ai agents seems fanciful for the time being. Maybe the tech improves to the point where enterprises cede control to ai agents en masse but it just doesn’t seem ready for prime time yet.
It’s more that the amount of effort required to manage the context to get the right answers isn’t trivial, and enterprises massively underestimate that part.
Context is one key to using LLMs, but it's a tool for navigating the LLM to the desired region of mindspace. Just dumping a prior LLM conversation into a new window won't get it to the same place as the original chain of interactions, often it will just break. I'd like to be able to store that state of internal layers that I navigated the LLM to via a series of prompts, and be able to go there directly, or to nearby states with edits, offsets from that state. "Representation Engineering Mistral-7B an Acid Trip" ( https://vgel.me/posts/representation-engineering/ ) gives a practical tutorial for something bigger:
"In October 2023, a group of authors from the Center for AI Safety, among others, published Representation Engineering: A Top-Down Approach to AI Transparency. That paper looks at a few methods of doing what they call "Representation Engineering": calculating a "control vector" that can be read from or added to model activations during inference to interpret or control the model's behavior, without prompt engineering or finetuning."
(I have to say it's ironic that the Center for AI Safety published something so obviously, wonderfully, powerfully abusable, -- and I fully intend to. Control vectors let you specify the states of the hidden layers of the model, using whatever method, e.g. PCA in link. The safetyism people seem to be scared of capabilities in general and in particular of getting the right answers to certain questions, so this is a huge own-goal.)
I'd like to have standard LLM starting states and contexts for various tasks, administered with a system like DSPy - linked from: https://www.latent.space/p/2025-papers , an excellent reference with many links worth reading.
In particular, an implicit set of concentric contexts of what an instance of an agent is trying to do at multiple time scales, in relation to various-sized groups of people or other agents. Agents intrinsically need to be oriented to furthering the interests of a user and the user's affiliations, everybody's agents have to look out for different, though not always independent, interests, so they need different implicit metrics for utility of possible goals to make the decisions right for them. These contexts will be basically the same structurally and in data, but with "*cui bono*" changed, at least within a group affiliation.
The widest contexts of what an agent should want socially as serving the interests of people and affiliations need only be figured out rarely for each group, but with huge amounts of computation for gaming out consequences of different possible policies and courses of action. The narrowest contexts will change constantly in their tactical goals within their larger, more strategic contexts, but in predictable ways for doing standard tasks, allowing easily computing the best control vector for each task step.
There needs to be much better context management in LLMs,though especially when nearing context length limits -- editing, summarizing, rewording -- it should be the LLMs scratch pad to use however works best, augmented with larger note-taking and file memory with online RAG. I hear OpenAI and Anthropic are doing approximately that, I'm on venice.ai for its privacy and hosting of less Silicon Valley-consensus aligned models, which are behind on many features). I think LLMs absolutely need tool use for anything a human would use SW to do, not just web search, calculation, command line, interpreters and compilers, but project management, scheduling, simulation ... any SW that benefits from being more reliable than an LLM's best guess.