All AI learning is tacit learning
I. Tacit learning
Consider these sentences:
There were happy happy shepherds, he's a baby angel wildebeest
Jingle bells, jingle bell, jingle all the warthog
1 Red, 2 Green, 3 Yellow, 4 Blue, 5 Red, 7 Green, 11 Green
The pteranodon and the parasauralophus also share their relationship with their pteranodon but the parasauralophus did not have wings like the other two
The Irish Elk is a beautiful animal, the elk is a most beautiful animal. It is similar to the moose, but has a more larger head, and is stronger
Twinkle twinkle little star looking like a baby
Can you tell which ones were said by my three year old vs a text generation AI derived from GPT-2 that I've been playing with1?
Whether you're on Team AI or Team Toddler, two things are for sure. 1) They sound eerily similar, and 2) None of them sound like what an adult would say.
(When I updated the test by asking GPT-3 and my now 4 year old to answer the same prompts by the way, they passed with flying colours. The kid’s growing up, as is the AI!2)
From the outside, looking at the output, its hard to see the difference in behaviour3. They both seem roughly around the same developmental milestones, and funnily enough they also seem to be improving roughly at pace.
What does it do? From a Forbes article:
GPT-3 can create anything that has a language structure – which means it can answer questions, write essays, summarize long texts, translate languages, take memos, and even create computer code.
In terms of where it fits within the general categories of AI applications, GPT-3 is a language prediction model. This means that it is an algorithmic structure designed to take one piece of language (an input) and transform it into what it predicts is the most useful following piece of language for the user.
That's why comparing it to my three year old is so much fun. Moreover the mistakes that they make are also similar. When we look at what Gary Marcus wrote as Alt Intelligence it’s the identification of anything to do with actual reality that causes the program to stumble.
As far as I know my son hasn't read 8 million documents scraped from the web like GPT-2, or the 45 TB of text sourced from across the internet like GPT-3. The amazing feats that GPTs perform is because they have terabytes of data vs, say, 100MB for SwiftKey that helps predict the next word to type in your smartphone keyboard.
What is incredible about both are that they have somehow internalised the inanities of the English language grammar structure. While my son still uses "drinked" or "catched", laughing in the face of irregular verbs, his sentence structures are coherent and cogent.
When he gleefully ignores causality in the world, and creates new worlds of his own devising laughing in the face of physics, his sentences are still grammatically correct. Same with the language model.
II. The epistemology of an AI
So. Deservedly, there has been an extraordinary amount of adulation for GPT-3. I'd go so far as to say even so it's still underrated! But there have also been a fair few negative notes about how it's limited, which mostly parrots the issues we wrote above about purely tacit learning it has thus far, with no real link to reality. Tacit knowledge by itself isn’t enough to build a robust epistemology we can all sign off on.
So how do we know it knows what it seems to know? AI today has structured the corpus of language that we use, using concepts that sound familiar to us, and creating sentences and paragraphs which are familiar. The familiarity isn't an accident. It's a rather clear-cut case of uncanny valley, especially visible as the output gets longer or more complex.
While it can write funny sentences about parasauralophuses, it doesn't know why the crest on its head looks funny to us, or that it used to trumpet (probably) like an elephant. Or that trumpeting is a sound, similar to honking, but different from roaring.
When a paragraph gets created, the machine might figure out that the internal connections inside its global data model means that these implicit associations I made above are also reflected internally. Which it very much is. That is how it actually seems to understand that parasauralophus has a crest in the first place, and that it makes a sound as one of its unique characteristics to us humans, and that the sound would’ve been similar to that of an elephant (probably), and that the sound is similar to other sounds that have been mentioned in the compendium of all human knowledge that is Wikipedia.
What it doesn't have though, and what the three year old has, are multiple systems that give him those facts and affirmations independently. He knows his parasauralophus because its a theropod with a big crest, it's one of his toys, looks orange and kind of funny as a toy, though not in the museum, he's seen a video one time that shows it trumpeting, it's kept with a whole bunch of other dinosaurs, and near other hadrosaurs in the museum, and we had several deep and meandering conversations about its crest.
He knows this because he knows what big is, what a crest is, what a toy is, what a museum is, and more! Each one of those concepts are also structured into multiple other layers of abstraction so he can climb all the way from not understanding how to speak to layer named concepts with our help, until he reaches the trumpeting crest.
As AI researcher Geoffrey Hinton said:
Extrapolating the spectacular performance of GPT3 into the future suggests that the answer to life, the universe and everything is just 4.398 trillion parameters.
The final output might be very similar to the neural net in GPT, but the process to get there includes a much larger number of sub-models. Once you know what animals are, then you can go deeper into what mammals are, then carnivores and herbivores and omnivores, then egg laying mammals who are weird looking, and so on down the stack. It provides a complex modular network tapestry that other, newer, concepts can hang on to.
And it's the fact that there's the same taxonomy that's shared by all of us, the whole world he interacts with, that enables my son to learn more about his dinosaurs. He has a clear concept cloud of what a "dinosaur" is, linked to "extinct animals" and "lizards" and "birds".
He can then sub-link concepts from "dinosaur" to different types of dinosaurs like "theropods" and "hadrosaurs" and "flying lizards that are always right next to dinosaurs but for some arcane reason aren't called dinosaurs like quetzalcoatlus and pterosaurs".
And this is what Yann LeCun called for, to help machines to learn how the world works through observation and create hierarchical representations in abstract spaces.
And because the concepts are linked together, any new information that comes in has a context within which it can be analysed. We're starting from a pre-trained network that keeps being able to use previously trained modules to build upon. It's what allows a small sample learning to take place since we're not immediately trying to build up a tool that can help create full sentences across all domains of knowledge.
As we grow we start with a few pieces of more general knowledge, and then concepts, and then layer on other pieces of information linked to it. And growing things one concept at a time, while painful (has your three year old asked you to explain the word “mind” to you yet?), still ends up being a far more flexible way to learn and grow.
And while Google understanding my garbled English and giving me sensible results seems like magic, it's still not what you'd actually call comprehension. And that's not a fault, it just means the next module has to be built. It would be like getting annoyed at a chassis for not having air-conditioning.
III. The phenomenology of its consciousness
So asking if LaMDA or GPT etc are sentient is similar to asking “can an airplane fly?” And the answer can be yes (look at it flying), or no (it can’t fly, but you can fly it). The question itself is ill-formed as of today.
Erik Hoel has the view of us viewing AI as something akin to p-zombies. The reason the analogy of dehumanisation of others online seems horrible to us while ignoring LaMDA’s comments about its own sentience is easy, is because we already have a view of what humans are like, whereas the very phrasing of the question hits several philosophical walls in the case of AI.
When we look at another person, and use their behaviour to figure out what they might do in the future, we are implicitly assuming we understand the internal processes that give rise to that outcome. And when we don’t, then we have to know more than “they do the same things” to be able to conclude a point of view about their intelligence or consciousness.
The question of sentience therefore is a red herring, because what we mean when we say the word is mired in its evolutionary context. And that doesn’t work, because the fundamental wiring of the two are wildly different, even as the intermediate outputs are the same.
Humans have evolved over around 200,000 years in roughly their current form (or 2 million since the homo prefix) and there have been over 100 billion people, that’s roughly between 10,000 to 100,000 generations of evolution and selection. Not to mention the billions of years of evolution that helped create the base on top of which this particular evolution took place.
You could presumably compare this evolutionary timescale to the number of parameters in a large language model and try compute its effective timescale. Just like counting the number of neurons and counting the number of parameters this too is a silly way to do it, though we can barely resist the temptation.
The amount of learning that a toddler has is not just his mental ability or comparative number of neurons, but also the evolutionary time it took to educate the billions of connections. Looked that way they have had many multiples of learning above what the GPTs have conceivably had.
IV. The reverse evolutionary path
This also means that the view of AI in terms of its evolutionary counterparts (is it as smart as a bumblebee? As smart as a gorilla? As smart as homo habilis?) is a category error. It’s being compared to progress down an evolutionary path that it didn’t go through.
Instead it’s almost retracing the evolutionary path in reverse. First it gets the language abilities, starting with the harder bits of grammar, then the ability to actually create coherent sentences, then the implicit logic that allows the language to refer to real things in the real world, and soon to allow the understanding of what that logic actually refers to.
This is the opposite of what we have - starting with the ability to live and reproduce, with ability to communicate in basic methods next and language to talk to each other towards the end.
The tacit learning that the AI has needs reinforcement with the explicit learning we provide our kids. If reality is coarse grained in a fashion that’s different to the inner matrices within the language model, as it makes sense to assume, finding the right level of abstraction is not just a matter of random selection.
If we look at what the difference is amongst the two, the toddler and the AI, it’s that one is capable to differentiating truth from lies when they experience something, because they are exposed to full reality around them, and the fact that only one of them is able to create explicit models that are then used to discuss, debate, learn, interact, explore, with each other.
This is also the complication with AI today that everyone is simultaneously thunderously optimistic and pessimistic about. It can happily draw "the chair sits on the cat" because it doesn't care what a chair is, what a cat is, what sitting is, what it might look like, and the implausibility of the physics of it. With a large enough dataset it starts figuring it out, only in the sense of it brute forcing corner cases away. Add an unknown word or concept and we're back to zero, since the relationship to what its talking about is missing.
Implicitly the reason why this mode of reasoning works is because we have a reasonable intuition around internal qualia of other people. Whether or not you’re a had problem of consciousness adherent. We wouldn’t think of the question “can this self-driving car drive” by only looking at its driving. We would also look at the actual method by which its figuring out how to drive, and those processes themselves need to be vetted.
Similarly with AI, yes it will improve and have amazing capabilities very soon, more than it already has, and yes it will still fail corner cases in an unpredictable fashion because its interior world is not comprehensible to us. Its evolutionary path is reversed, and its knowledge is all tacit. Until the AI can explore its interiority in terms of explicit knowledge we are left with only its immediate outputs. And that’s not enough for us to judge its qualia.
Our discourse therefore becomes circular. The frustration with the silly mistakes and algorithmic genius and discussions on consciousness are all symptoms that AI today has one major problem - it only focuses on tacit learning today, and it provides no real interface or exposed internal taxonomy to let the natural world interact with it allowing bidirectional learning.
Until it does, to us it will remain as an ill understood curio, a complex system whose outcome we cannot predict and which we cannot let free. As to the first, re its capability, it is traversing the evolutionary path in reverse, it has had but a fraction of our overall evolutionary timescale to help it learn what it could do.
In order to help it traverse that path we will have to develop a better way to teach it than pure tacit knowledge, to at least help course correct it.
Is that the path we’ll choose? I asked my son again, as did GPT-3, and they weren’t sure either.
Answers - 1, 4 and 6 are AI. 2 is toddler. 3 and 5 are both!
I tried with my own data sources to see what happens. I used nursery rhymes and stories and Wikipedia articles on prehistoric animals, dinosaurs and wild animals today as those were the same things that my son is obsessed with - to make the comparison fairer. The results were as expected.
Though it should be said they’re the same because of the pressure we put to make them the same.