37 Comments

"What we have is closer to a slice of the library of Babel where we get to read not just the books that are already written, but also the books that are close enough to the books that are threatened that the information exists in the interstitial gaps." is a gorgeous and poetic statement of the strengths and weaknesses of LLMs. Thank you for the post!

Expand full comment
Apr 23Liked by Rohit Krishnan

What a brilliant analysis! Thank you for sharing it. I sent it to a ML master’s student I know who’s looking for ML inspiration. This really rekindled my appreciation for the beauty and strangeness of AI.

Expand full comment
Apr 24Liked by Rohit Krishnan

LLMs in general seem to be bad at basic logical thinking. Wolfram talks about this in his 'What is ChatGPT Doing' post.

E.g., every time a new model comes out, I ask for a proof of 'P v ~P' in the propositional-logic proof system of its choice, or sometimes in particular types of proof systems (e.g. natural deduction). The models always give a confident answer that completely fails.

Expand full comment

So what I’m hearing is that current-gen LLMs have ADHD…? That tracks.

Expand full comment
Apr 23Liked by Rohit Krishnan

https://open.substack.com/pub/cybilxtheais/p/matchstick-dissonance?r=2ar57s&utm_medium=ios

Been thinking about this from another dimension.

Expand full comment
May 1Liked by Rohit Krishnan

LLM are statistical predictors. Any time you have a specialized area, and it is given enough of examples for (1) how to do work (2) how to invoke tools (3) how to inspect results and see what to do next based on feedback, the LLM will do very well and can improve if more examples are added where they fail.

So, even without metacognition, etc., it can be a very valuable and reliable workhorse. We are not there yet, of course, but likely because current LLM are generalists that do not have sufficiently dense and detailed examples of strategies to follow.

Expand full comment
Apr 29Liked by Rohit Krishnan

This is so insightful and I could not agree more. This is the concern of research into neurosymbolic AI--check out this review article: https://ieeexplore.ieee.org/document/10148662, and some of the articles here: https://neurosymbolic-ai-journal.com/reviewed-accepted

Expand full comment
Apr 29Liked by Rohit Krishnan

Great analysis, I'm largely in agreement!

You can find much simpler tasks that demonstrate this problem, eg "Hi! Please calculate the number of 1s in this list: [1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]". Or even more simply than that, they have a terrible time with parity checking (in fact I've seen one researcher claim that parity is *maximally* hard for transformers).

I think you nail it when you point to the lack of deterministic storage (even a few variables whose values can be set/stored/read), and don't necessarily have to invoke more abstract notions like goal drift. I think this also sufficiently explains why they can't learn Conway's Life.

> Also, at least with smaller models, there's competition within the weights on what gets learnt.

Large models too; we can be confident of this because they start to use superposition, which wouldn't be necessary if they weren't trying to learn more features than they have weights. The world is very high-dimensional :D

Expand full comment
Apr 28Liked by Rohit Krishnan

Rohit, very enlightening! I am wondering if we can translate your blog into Chinese and post it in AI community. We will highlight your name and keep the original link on the top of the translation. Thank you.

Expand full comment
Apr 28Liked by Rohit Krishnan

I wonder what happens if you have the rules in the context pass the neighbor states and evaluate it cell by cell. Ie use it just as a compute function. Imho that should work. So you keep state and iteration externally. Which is as you said what agent systems can provide.

Did you try teaching through code? Ie a few different implementations of GoL?

But then we would just use it as a transformation function with high language skills.

Did you try to add agents that keep the grid model and can retrieve relevant parts and update state. In GOL it’s all local anyhow.

Also your point about relationships was interesting that the llms have a hard time reversing. Thinking about alpha go etc which are based on gnns. Perhaps thats what’s missing inside of the models. An relational representation of the world?

Thanks for the great and detailed post. It inspired a lot of questions.

Expand full comment

Great article, thanks for beating so hard on the limits of LLMs, and your description of trying to get them to do something that feels so simple made your frustration really palpable :) Attention, evidently, is not all we need.

"An idea I’m partial to is multiple planning agents at different levels of hierarchies which are able to direct other specialised agents with their own sub agents and so on, all interlinked with each other, once reliability gets somewhat better." That really reminds me of Daniel Dennett's (may his memory be a blessing) model of how consciousness arises.

Expand full comment
Apr 26Liked by Rohit Krishnan

Interesting read! I'm a casual LLM user, but was really surprised when several models I tried couldn't generate a short essay with grammar errors in it. I was trying to create an editing activity for college journalists and the models really struggled to write something that was grammatically incorrect. I went through many rounds trying to ask for specific types of grammar errors, thinking that might help, but it's inability to reset seemed to make it more confused. Maybe the problem was my prompting, not the model. Has anyone else tried something like this?

Expand full comment

For some reason this post reminded me of graduate students. This isn't fair because the distinction between us and LLMs is much more profound and qualitatively different (and I strongly suspect you are right that bats, octopodes, and pigs reason more similarly to us than LLMs do). And yet the way you described the LLM reminds me of how first year grad students are, or perhaps how certain kinds of human minds are, where they only see the literature / that which exists, and they cannot think deeply or substantially beyond it. It seems to me, or it feels to me, that they are unable to get the entire deep structure of thinking that the literature represents inside their minds. They can see what the literature is on the surface. They can see enough of the underlying connective tissue that they can plug the gaps in the surface, but no more than that; they would not be able to perceive gaps in the deeper connective tissue, for example.

Great post. I already know I will reread it.

Expand full comment

Definitely relevant (from at least two, maybe three levels, depending upon the level of decomposition one is working with, ie: is cognition and culture (and the cognitive, logical, epistemic, etc norms *and harmful constraints* that come with it) split into two or not):

https://vm.tiktok.com/ZMMqm7y5k/

Expand full comment

> Sufficiently scaled up statistics is indistinguishable from intelligence, within the distribution of the training data.

What is your take on intelligence vs wisdom? In this essay (https://tmfow.substack.com/p/artificial-intelligence-and-living) I provide a perspective on why this distinction is of enormous importance for AI/AGI, as well as for why LLMs (and any other approach to AI) are inherently limited. Would be curious to hear your perspective on it.

Expand full comment