Seeing like an LLM

Jul 9

"I will run the tests again. I expect nothing. I am a leaf on the wind." an LLM while coding

9 Comments

Thanks, Rohit. This is brilliant and particularly salient to me right now. I’m a composer and musician and am working on a project started by geologist/microbiologist Bruce Fouke and photographer Tom Murphy based on their book The Art of Yellowstone Science. We meet this weekend and I’ll forward your article to them.

Expand full comment

Reply (1)

Rohit Krishnan

Jul 9

I’m so glad. The project sounds very interesting!

Expand full comment

Tim O'Reilly

Jul 11

I have to say your comments about the LLM making do with the context it's given reminds me a bit of psychiatrist RD Laing's famous line that "a psychosis is not a disease, it's a cure." (Frank Herbert made a kind of a similar point in his first novel, Under Pressure.)

Expand full comment

Brandon Reinhart

Jul 27

Excellent post. I use claude code now for 8+ hours every day. My life - my job which was already my life - has utterly changed in just a few weeks' time. I almost haven't had time to think about it.

I think it may be the case that because any in-depth work is effectively playing an iterated game with context, not all problems will be able to be solved with perfectly engineered initial context. There is no one CLAUDE.md to rule them all.

Each interaction and token generation cycle is randomized and due to, maybe, this fact and the fact that attention is not easily controlled, there are failure states that lie waiting like strange attractors. Common patterns of strange thinking that continually emerge.

It's not unlike if I had Taravangian as a coworker. "On some days, he cries and has compassion for everyone around him, while unable to have intelligent conversations, while on others he is brilliant, yet easily speaks of killing singing children."

Expand full comment

Samir Pandit

Jul 24

Cognition needs very distinct capabilities and context engineering doesn't seem to be no where close to enabling this capabilities with RAG or hybrid architectures being proposed. Maybe we need to think of it as how human brain works with whatever limited understanding we have from science perspective .Did we not have neural networks promising this cognition capabilities ?

Expand full comment

Rajesh Achanta

Jul 10

A brilliant intern is ...well...still an intern even if you paid a zillion $ to hire 'em. on-the-job training is still a big deal, looks like.

Expand full comment

Dave Friedman

Jul 9

Very good post. I keep seeing claims that enterprises will adopt ai and rework their workflows but given all the problems you outline (mechahitler, etc) the idea of enterprises ceding workflows to ai agents seems fanciful for the time being. Maybe the tech improves to the point where enterprises cede control to ai agents en masse but it just doesn’t seem ready for prime time yet.

Expand full comment

Reply (1)

Rohit Krishnan

Jul 9

It’s more that the amount of effort required to manage the context to get the right answers isn’t trivial, and enterprises massively underestimate that part.

Expand full comment

Enon

Jul 10

Context is one key to using LLMs, but it's a tool for navigating the LLM to the desired region of mindspace. Just dumping a prior LLM conversation into a new window won't get it to the same place as the original chain of interactions, often it will just break. I'd like to be able to store that state of internal layers that I navigated the LLM to via a series of prompts, and be able to go there directly, or to nearby states with edits, offsets from that state. "Representation Engineering Mistral-7B an Acid Trip" ( https://vgel.me/posts/representation-engineering/ ) gives a practical tutorial for something bigger:

"In October 2023, a group of authors from the Center for AI Safety, among others, published Representation Engineering: A Top-Down Approach to AI Transparency. That paper looks at a few methods of doing what they call "Representation Engineering": calculating a "control vector" that can be read from or added to model activations during inference to interpret or control the model's behavior, without prompt engineering or finetuning."

(I have to say it's ironic that the Center for AI Safety published something so obviously, wonderfully, powerfully abusable, -- and I fully intend to. Control vectors let you specify the states of the hidden layers of the model, using whatever method, e.g. PCA in link. The safetyism people seem to be scared of capabilities in general and in particular of getting the right answers to certain questions, so this is a huge own-goal.)

I'd like to have standard LLM starting states and contexts for various tasks, administered with a system like DSPy - linked from: https://www.latent.space/p/2025-papers , an excellent reference with many links worth reading.

In particular, an implicit set of concentric contexts of what an instance of an agent is trying to do at multiple time scales, in relation to various-sized groups of people or other agents. Agents intrinsically need to be oriented to furthering the interests of a user and the user's affiliations, everybody's agents have to look out for different, though not always independent, interests, so they need different implicit metrics for utility of possible goals to make the decisions right for them. These contexts will be basically the same structurally and in data, but with "*cui bono*" changed, at least within a group affiliation.

The widest contexts of what an agent should want socially as serving the interests of people and affiliations need only be figured out rarely for each group, but with huge amounts of computation for gaming out consequences of different possible policies and courses of action. The narrowest contexts will change constantly in their tactical goals within their larger, more strategic contexts, but in predictable ways for doing standard tasks, allowing easily computing the best control vector for each task step.

There needs to be much better context management in LLMs,though especially when nearing context length limits -- editing, summarizing, rewording -- it should be the LLMs scratch pad to use however works best, augmented with larger note-taking and file memory with online RAG. I hear OpenAI and Anthropic are doing approximately that, I'm on venice.ai for its privacy and hosting of less Silicon Valley-consensus aligned models, which are behind on many features). I think LLMs absolutely need tool use for anything a human would use SW to do, not just web search, calculation, command line, interpreters and compilers, but project management, scheduling, simulation ... any SW that benefits from being more reliable than an LLM's best guess.

Expand full comment

Strange Loop Canon

Seeing like an LLM