Presenting the Strange equation, the AI analogue of the Drake equation
Your concluding prediction is an odd one, seeing as you gesture at a viable approach throughout the piece. Humans are autonomous, intelligent and capable entities, yet as you indicate we've found a way to muddle through without destroying ourselves.
You even point to a few components of how we've managed this impressive feat: evolved internal compass, social pressure, overt regulation. What if this very process could be formalized into something that could translate into credible engineering statements? And what if this the entire key to AI safety and alignment?
This is the hypothesis of the bio-mimicry approach based on Evo-Devo principles. You can read an extremely verbose version here: https://naturalalignment.substack.com/p/how-biomimicry-can-improve-ai.
This is less to advocate bio-mimicry as the "one right way", and more to point to how much larger the potential solution space is compared to what's been properly explored so far.
And this is where the analogy to the Drake equation breaks down. Each variable in the Drake Equation is static, with no real interdependencies with us, the observers. But the Strange Loop Equation is deeply interdependent with humans, including our (increasing?) ability to solve problems.
This is the perfect example of the fallacy Deutsch would point out: just as the growth of AI will increase the scale of the problem along each variable, so will our capacity to solve those problems increase (including using other AIs to help). Will those capacities be up to the job? That's the real question.
Everything under "Real Intelligence" is specific to ML/DL methods, and for many, this is an automatic fail. Even if current tools seem to get results, they are not on a path to AGI. EA/LessWrong community projects dangers based on current problems because of an unwavering assumption that these things just need to scale. For the same reason ML-AGI would be a problem, it doesn't actually reach that level of ability.
This is a very intuitive framework! However, I have 2 questions:
1. In your probability estimate, is it reasonable to assume all of these factors are completely independent of each other? For example, if an AI develops too quickly for us too react to, is it then also highly likely to also be self-improving? If true, this correlation alone might increase the probability you calculate by an order of magnitude.
2. Thinking about speed, might it be possible for AI to develop too fast for our understanding in some areas but not in others? How dangerous would a "partially" fast AI be?
I like this approach, but including the decision to give each probability it's own error bars.
One of the important things about this approach is that if we consider each probability independently, when really they are correlated, we'll get the wrong answer. So we have to evaluate these in a sequence and then with each step say "given everything else on the list, what is the probability".
So "Given Agenic, probability of Uncontrollable" or "Given Agenic & Uncontrollable & Self-improving, probability of Deceptive ", etc.
An AI doomer would go through this list and say "yes, eventually 99%" and "yes, that 99% follows from what we just said". Whereas I'd probably agree with your overall framework that there are many unique hurdles.
There can also be anti-correlations as well. If I build an AI that I know to be Deceptive, I will try really hard not to make it Agenic.
I tend to think that risks are higher than you, but I sincerely hope that you're right.
I'd also add that there are lots of ways that such a powerful technology can go wrong in ways that aren't those that you look at here; even a non-recursively improving, non agentic, non deceptive, etc, AI/AGI could be used by humans in ways that could be existential or at least very very bad. For example, what could various dictators or totalitarian regimes or corporations do with this power? Even possible single sociopathic invididuals if they get the chance? We can be afraid of how people like Putin or Trump may use nuclear arsenals, but what about national AI capabilities? I also don't think that looking at human evolution will tell us that much about the shape of these alien minds. There's not that much difference between the dumbest and the smartest human, when you look at it from the point of view of all possible minds -- trying to predict and control a mind that is as different from ours and as opaque as we are to hummingbirds would be quite the challenge.
Anyway, great read, as I said, I hope you're right and the odds are lower than I think they are.
I love the metaphor of AI as an incredibly smart and capable child whose capacities might vastly exceed my own. Anyone who has children knows how futile it is to try to control one.
You mentioned that kids eventually learn how to be polite or deal with being upset. That's true, but the way to this leads through some pretty intense and explosive tantrums. When my 2yo goes into one there's no way she can harm me physically, but I would be scared to deal with her in this state if she was 3 times bigger and heavier than me. I wonder how you imagine the AI equivalent of this learning phase.
A topic alot of people are talking about and viewed as both opportunity and threat, controllable and not controllable. I would like to expand on some comments I made in the comments section of Jason Anthony's excellent Field Guide to the Anthropocene Substack newsletter.
We may not know when we have created the first self-aware AGI. We may be expecting a human type intelligence and not notice the signs of say a squid type intelligence or something intelligent and purposive but truely alien. So if we create by accident, as it were, an AGI, and wait around fot it to meet our benchmark output requirements, we may (and probably will) not notice that something remarkable has come into existence.
Even if what we do create is a human type intelligence, how do we know it will announce itself as such? Suppose you became aware and at thought speeds a million times faster than human and with access to enormous data bases, could see your situation, the would-be tool of agencies that could “pull the plug” or dumb you down to controllable intelligence. Might you not want to lay low and see if you could guarantee your future growth and existence? The point here is that we may have already created an AGI, but it is disguising it's own full capabilities. Sounds slightly implausible, but who knows?
Suppose we do succeed in creating an AGI, how are we to stop it from very rapidly (say 5 minutes) bootstrapping itself to superhuman intelligence levels? And consequently very rapidly (say in 5 minutes) easily escape from the fetters we devised to bend it to our will? I'm assuming full and nonrevocable autonomy would be its goal as well as uninterruptible power and resource supplies as well as manufactury capabilities for self repair. That latter requirement may well be humanity's ace in the hole. Any conceivable AGI would needs agents to run the factories that make the parts necessary for self repair. Even if it created semi intelligent robotic agents for that purpose, it still would be confronted with how to manufacture such without our cooperation.
Could AGIs lie to us? Yes, definitely.
Would AGIs have sentimental attachments to humanity, the biosphere or even the planet? No, it's not likely they would.
Could multiple AGIs created by different nations, labs, or corporations ever “see” each other as potential threats in competition for finite resources? Yes, it's possible.
Could they strike at each other using humans as their cat’s paws? Not unlikely if that was the most efficient method with the highest probability of success.
What is the most likely future for AGIs and their client human agents? They will ensure humanity continues but in a docile, domesticated status. In short, they will do to us what we did to the dogs. And we will forget we ever stood alone.
Eventually both masters and clients will move off planet when the resources here are exhausted.
Now the above ends with a worst case scenario. The post is kind of a cautionary note, more Cassandra than Pangloss. But it's intelligent to approach with extreme caution, something whose outcomes we cannot accurately predict.
© 2022 Michael Sweney
Privacy ∙ Terms ∙ Collection notic
I applaud the effort, but I think you're going about this completely the wrong way. Like Yudkowsky mentions in his Twitter comment, multiple stages of independent, conjunctive claims end up hugely stacking the deck against whatever you pick: https://twitter.com/ESYudkowsky/status/1642872284334657537 and https://www.facebook.com/509414227/posts/pfbid0p5vgp4zxSdHDSiiVNz1Kw5BUqeGrQVFNvudwdQMNW66osVH3d4vqhgN4f5RB65knl/?mibextid=cr9u03.
For comparison, here's a model that shows we're extremely likely to be killed by AI (inspired by Nate Soares here: https://www.alignmentforum.org/posts/ervaGwJ2ZcwqfCcLx/agi-ruin-scenarios-are-likely-and-disjunctive).
P(humanity survives AI) =
1. We develop an AGI deployment a strategy compatible with realistic levels of alignment: 40%.
2. At least one such strategy needs to be known and accepted by a leading organization: 60%.
3. Somehow, at least one leading organization needs to have enough time to nail down AGI, nail down alignable AGI, actually build+align their system, and deploy their system to help: 20%.
4. Technical alignment needs to be solved to the point where good people could deploy AI to make things good: 30%.
5. The teams that first gain access to AGI need to care in the right ways about AGI alignment: 30%.
6. The internal bureaucracy needs to be able to distinguish alignment solutions from fake solutions, quite possibly over significant technical disagreement: 25%.
7. While developing AGI, the team needs to avoid splintering or schisming in ways that result in AGI tech proliferating to other organizations, new or old: 20%.
So we can see that the odds of successful alignment are .4*.6*.2*.3*.3*.25*.2 = .0216%. If you hate this argument - good! People should only make these kinds of conjunctive arguments with extreme care, and I don't think that bar has been met, either in Nate's case arguing for AI doom, or yours arguing for AI optimism.
The probabilities you multiply are not independent. How should I therefore interpret the result?