The Claude comment regarding LLM potential (footnote 8) reminds me of the children's "Stone soup" story... In current parlance an LLM is a language model based on the transformer architecture. If you change the architecture, training paradigms, add interactive learning, causal reasoning, etc, then at some point this is no longer a transformer or a language model, no longer stone soup - it's a new architecture, and a new kind of model, moving towards a more animal-like cognitive architecture, perhaps.
Can you ride a bicycle to the moon? Yes, if you remove the wheels and add rocket engines, etc!
It kinda fun to think back. When I was at school I had a near perfect memory - so I also figured I didn't need to study. And that worked until sometime halfway through my degree course!
LLMs are pretty bad at any sort of compression so far, your example of cellular automata is way beyond what they can do. James Bowrey uses just the 5-bit binary numbers from 0 to 16 run together and asks for a short program that will reproduce the bit-string. Most any correct solution will just quote the string, it won't use a simple counter to compress the bit string. He's a monomaniac on data compression, one of the Hutter Prize judges, but a bit stuck on lossless compression and doesn't distinguish meaning / information from noise / mere data when there are many cases where that is central, e.g. markets, where a single bit of information is equivalent to doubling your money, as Kelly showed in the '50s (same paper better known for the Kelly Criterion). Think of how much market data goes through an HFT fund over its doubling time. Effectively they manage only a single bit of predictive compression over many GB of market data.
Appreciate your nuanced view of whether they understand or reason. Too often understanding is seen as black and white, so that they “dont truly understand” because they “just predict the next token”. In reality they clearly have some amount of understanding, but the quality of their understanding is not as high as a human. The epicycles represent undertanding but Newton represents higher quality understanding. It also seems clear that the quality of llm understanding is increasing.
Thanks, good points. And I like Cowen's definition of understanding. A corollary I found truer by the year is that intelligent people are able to explain/predict complicated things in simple terms, dumb people explain/predict simple things in complicated terms.
By that definition, LLMs are fabulously dumb.
If scaling current methods is the final say in AI, biology is far from being obsolete.
Discovering F=Gmm/r^2 or the underlying generation of planetary movement took thousands of years and for most of that time, people were pretty convinced by epicycles which are a good approximation similar to what they claim LLMs can do. Claiming that “LLMs cannot reason” because they replicate failures of human reasoning is not a strong claim and in fact the fact that LLMs tend to come up with epicycles similar to how Ptolemy actually did and humans believed for thousands of years is a really good argument that they’re reasoning pretty similar to how we do.
I'm making no such claims, unusual for an essay I know :) I'm saying:
- They primarily learn patterns.
- Learnign such patterns gets them to be remarkably useful, more so than anyone would've thought before.
- Learning such patterns as yet still causes many "silly mistakes" because they don't learn the underlying generators.
- With sufficient amounts of data they do learn underlying principles for some things but it's not a robust enough process
- Reasoning helps here, because they learn to reason like us, but this still has the same problem that the reasoning patterns they learn do not have the same underlying generator.
- As we push more data/ info/ patterns into the models they will get smarter about what we want them to do, even though the type of intelligence is closer to a market intelligence than an individual being (speculative).
Interesting, thanks for the clarification. My first question would be whether generators even exist for many phenomena - even for your provided example of gravity, we are aware that the model is still incomplete and we don’t entirely understand the full complexity of the system. The underlying assumption here is that there exists some sort of platonic ideal structure that produces phenomena which can be modeled and generalized, which is meaningfully different from pattern following, which seems suspect.
It's not true of everything I think but true enough of enough things that something akin to this definitely is happening across all of the types of questions we ask it and the answers we receive
The Claude comment regarding LLM potential (footnote 8) reminds me of the children's "Stone soup" story... In current parlance an LLM is a language model based on the transformer architecture. If you change the architecture, training paradigms, add interactive learning, causal reasoning, etc, then at some point this is no longer a transformer or a language model, no longer stone soup - it's a new architecture, and a new kind of model, moving towards a more animal-like cognitive architecture, perhaps.
Can you ride a bicycle to the moon? Yes, if you remove the wheels and add rocket engines, etc!
It kinda fun to think back. When I was at school I had a near perfect memory - so I also figured I didn't need to study. And that worked until sometime halfway through my degree course!
LLMs are pretty bad at any sort of compression so far, your example of cellular automata is way beyond what they can do. James Bowrey uses just the 5-bit binary numbers from 0 to 16 run together and asks for a short program that will reproduce the bit-string. Most any correct solution will just quote the string, it won't use a simple counter to compress the bit string. He's a monomaniac on data compression, one of the Hutter Prize judges, but a bit stuck on lossless compression and doesn't distinguish meaning / information from noise / mere data when there are many cases where that is central, e.g. markets, where a single bit of information is equivalent to doubling your money, as Kelly showed in the '50s (same paper better known for the Kelly Criterion). Think of how much market data goes through an HFT fund over its doubling time. Effectively they manage only a single bit of predictive compression over many GB of market data.
On manifolds of solutions:
you might want to check out Bzogramming substack, e.g. https://bzolang.blog/p/the-most-profound-problem-in-mathematics on SAT solvers that can solve many practical NP-hard problems. Many other illuminating posts there.
For a dictionary of intimidating geometric ML math that is semi-comprehensible:
https://patricknicolas.substack.com/p/demystifying-the-math-of-geometric
Great article. I wonder if the No Free Lunch theorem is relevant here. https://en.wikipedia.org/wiki/No_free_lunch_theorem
Couple of small typos:
> But what they get wrong are where the interesting failures.
> behaviourally speaking its outrage does not seem different (should be "output"?)
Thanks! Will fix.
Appreciate your nuanced view of whether they understand or reason. Too often understanding is seen as black and white, so that they “dont truly understand” because they “just predict the next token”. In reality they clearly have some amount of understanding, but the quality of their understanding is not as high as a human. The epicycles represent undertanding but Newton represents higher quality understanding. It also seems clear that the quality of llm understanding is increasing.
Thank you! That was the intention.
Thanks, good points. And I like Cowen's definition of understanding. A corollary I found truer by the year is that intelligent people are able to explain/predict complicated things in simple terms, dumb people explain/predict simple things in complicated terms.
By that definition, LLMs are fabulously dumb.
If scaling current methods is the final say in AI, biology is far from being obsolete.
Discovering F=Gmm/r^2 or the underlying generation of planetary movement took thousands of years and for most of that time, people were pretty convinced by epicycles which are a good approximation similar to what they claim LLMs can do. Claiming that “LLMs cannot reason” because they replicate failures of human reasoning is not a strong claim and in fact the fact that LLMs tend to come up with epicycles similar to how Ptolemy actually did and humans believed for thousands of years is a really good argument that they’re reasoning pretty similar to how we do.
I explicitly state that llms do reason in the essay.
Mea culpa, thanks for the clarification, on rereading I see you mentioned that, but I’m still pretty confused as to what your central claim is:
1) LLMs will never be able to come up with new things
2) LLMs are incapable of understanding the underlying principles behind things
3) something else?
I'm making no such claims, unusual for an essay I know :) I'm saying:
- They primarily learn patterns.
- Learnign such patterns gets them to be remarkably useful, more so than anyone would've thought before.
- Learning such patterns as yet still causes many "silly mistakes" because they don't learn the underlying generators.
- With sufficient amounts of data they do learn underlying principles for some things but it's not a robust enough process
- Reasoning helps here, because they learn to reason like us, but this still has the same problem that the reasoning patterns they learn do not have the same underlying generator.
- As we push more data/ info/ patterns into the models they will get smarter about what we want them to do, even though the type of intelligence is closer to a market intelligence than an individual being (speculative).
Maybe should add a summary to the bottom.
Interesting, thanks for the clarification. My first question would be whether generators even exist for many phenomena - even for your provided example of gravity, we are aware that the model is still incomplete and we don’t entirely understand the full complexity of the system. The underlying assumption here is that there exists some sort of platonic ideal structure that produces phenomena which can be modeled and generalized, which is meaningfully different from pattern following, which seems suspect.
It's not true of everything I think but true enough of enough things that something akin to this definitely is happening across all of the types of questions we ask it and the answers we receive