A safety story

Ceteris is never Paribus

May 02, 2023

an incredible solar system painting allegorical and fantastical

“I think it’s unlikely that we’re all going to die from pushing AGI research. Hinton’s resignation feels just end of career fumes.” I said, mostly to move past the topic.

“How can you say that?” he pushed up his glasses. “Don’t you think superintelligence is dangerous?”

“I don’t even know what superintelligence is,” I start to say. “When I-”

“Imagine GPT 4, but it keeps getting smarter,” he cut me off. “It is so good that it can program 5x better and write 10x better. That’s a scary thought.”

Exciting, rather, I think to myself, but I’m wary of verbalising this. “Yeeeees,” I said. “I don’t think a 10x smarter GPT 4 is scary to be honest. Would be lovely to live without my white collar drudgery.” The images of aligning text boxesjust so, as I did for many years, flashed in front of my eyes.

“It will obviously try to make itself smarter,” he continued. “That’s what I would do. Wouldn’t you?” He didn’t wait for an answer. “Once it starts making itself better there’s no easy way to turn it off.”

“Surely we can see it happening though? A 10% a day increase only doubles in a week.”

“We might not be paying enough attention. God knows we’re already playing it way too fast and loose!”

I nodded and took a sip of my coffee. Opinions about how fast is too fast are the best way to hit a natural lull in a conversation.

“Do you really think that superintelligence is impossible to reach?”

“Well,” I started. “It’s definitely possible in theory. Nobody can deny that. But-”

“Well then, you have to be worried.”

“Sure, eventually. But right now, we’re all fine, right? Nobody’s gotten hurt and nobody’s being fooled much. Everyone just seems to be having a grand old time.”

“Maybe. But for how long? People are already getting laid off, and there was that guy who died after talking to a chatbot.”

“Wasn’t he already depressed?”

“Yes, but we can’t rule out causality just because the proximate cause isn’t the only factor.”

I didn’t know how to respond to that without talking a lot more than I wanted to about suicide. “Right, sure, but my point is that superintelligence is possible in theory doesn’t mean superintelligence is possible now or soon.”

“How can we even know that?”

“I don’t think there’s like a simple test. But we can look at what’s in front of us and see that it’s pretty far from a dangerous crazy savant. I mean, it can’t even do proper sestinas. And makes mistakes all the time.”

“Less and less though right? And it can write code, and even error-correct if its capable of asking itself.”

“Only for very simple things.”

“That’s true today,” he conceded. “But very soon it won’t be the case. Look how fast things have been moving. Most of the things we’re talking about barely could be conceived a few months ago. Or even a couple of quarters. And now, they’re possible, like magic! You really think we’re stopping here?”

We were back at the speed argument. “No, I don’t think so, but I also don’t think “lines go up” is enough of a reason to worry.”

“Why not?”

“Well, because GPT-2 could barely add two numbers or write a poem before it forgot the question, and GPT-4 hallucinates in a much more elegant fashion. We’ve taught it to work better to our needs. If anything it shows that they’re being “taught”, though that feels a wrong word, how to respond better to us.”

“Only via RLHF,” he scoffed. “That won’t scale.”

“Not forever,” I agreed. “But it scaled pretty well until now.”

“Nah, look at Sydney. She had to be shut down because people were freaking out.”

“Isn’t that a good thing?” I asked. “Sydney seemingly acted in a bit more of a forward fashion, not much in my opinion but whatever, and even with that they fixed it lickety split, just adding a few more Commandments to its prompt.”

“If it was smarter it would just edit those out. You can always find a way out, like we did with DAN.”

“But we'd see that happening right? Since GPT-4 we've been slowly testing and testing and delaying their release till we’re sure, isn’t that what we want?”

“Nah, they're just pretending.”

“Well they didn't release it for an added half a year almost. That sort of pretending is like … well if it walks like a duck and quacks like a duck.”

“Sure, but it's hardly going to scale. Even the proponents say so “

“Yes, maybe it will hit another bump and stop, and I guess by then hopefully we've found another way.”

“Or stop.”

“And since we can't tell when this happens, we shouldn't even be waiting till the last minute.”

“Ummm. What do you suggest?”

“A ban’s a bit out there I agree, though something like a ban would be good.”

“A non-ban ban?”

“Don't make fun of me,” he said. “I think it would've been great if we didn't have the problem at all.”

“That ship has long sailed,” I said. “And in any case I don't know, I'm hopeful it helps me stop filling in Excel sheets about project updates. That's worth a point of x-risk or two.”

“Its not something to joke about,” he said rather sternly. “The screams of our future generation who would have been never been is no less loud.”

“Right, yes, of course.”

“ We can't afford to be cavalier about this risk. We need regulation.”

“Do we know what to regulate?”

“Everything.”

“That's a bit … comprehensive.”

“Yes. And maybe we should add funding for some people who can help us figure out what to regulate.”

“What people?”

“Really smart, thoughtful people.”

“That’s a short list,” I said, thinking about a few of the smart, thoughtful people I’d met. “And a dangerous one. Do you think there is any chance that once there is that funding the answer will come back with anything less than comprehensive regulation? Like GDPR on steroids?”

“God, I hope not.” He shook to clear out the mental cobwebs. “ It would have to be really smart people, so that we don't repeat the mistakes of the FDA.”

“One way to do that might be to figure out what the regulate before we start asking for regulation. Politicians are not known to be shy when you ask them to take control. That's kind of why we have an adversarial system.”

Another frown. “If we just worked better together, this wouldn't even be a problem.”

If. That seemed like a rather succinct summary of most of human history.

“So self-regulation is insufficient, active regulation is sought after but we don't know what for, existing tools and systems are insufficient, and you are convinced that the combination of these means that we are headed for dark times.”

“Not just me. Have you not seen the large number of top researchers telling us Doom is inevitable? Geoff Hinton just resigned and is mournful of his life's work. It doesn't get clearer than that.”

“I saw that, but his reasons seem to be basically the same as our conversation? It seems to me that he doesn't have any particular insight on this problem beyond what we have just talked about.”

“He is smart enough that if he is worried, we should be worried.”

“Maybe,” I said. “But not blindly, surely. We could trade names all night long here but that doesn't help us get to an answer.”

“The mere fact we could trade names is a point in favour of shutting it all down.”

“We have always had famous scientists arguing for positions that were scientifically incorrect or empirically on sound. Global warming denial, COVID scepticism, COVID optimism, there are so many examples,” I said again. “Why haven't we seen any actual evidence though? It's all just extrapolations. Where is an actual real world harm?”

“Xkcd said it best. If we do not stop exponential curves before they get exponential, then we will never be able to flatten the curve.”

“I am asking for proof that it is exponential.”

“Just look at the papers being released! It's super-exponential if anything!”

“Yes there's a lot of attention and interest. But it's not the same as capabilities increasing so fast that it just upgrades itself by its bootstraps and becomes superhuman. This is intra-paradigm improvements, as Kuhn might say, not continuous new paradigm-jumps.”

“It's already superhuman in small domains. Why are you so sure it won't get to be superhuman in all domains?”

“Because somewhere between some and all lies a vast gulf that isn't easy to cross. We find that again and again and again.”

“Human beings are biologically disadvantaged in that their cranial capacities constrained by childbirth. Surely an artificial being without this constraint will get smarter! How can you deny this?”

“Because “smarter” isn't a well defined term, and because while it's theoretically possible, that's not the same as actually feasible right now.”

“Doesn't have to be right now. But if it's true in the next 5 years we're all still dead. We could barely stop an unthinking virus which replicated itself in our sputum. Imagine if it could think, and plan!”

“Feels an awful lot like predicting a particular bleak future and then getting scared of it. If you had the ability to do the precursor to what you’re suggesting there, we could also use it to make our world like 20 times better, and those who did so will get mega rich.”

“The fact though is that once that kind of power is easily available that will give them power to the worst of us.”

“Like terrorists getting mobile phones you mean?”

“Yeah. But much worse. Imagine if everyone had nukes easily available. Wouldn't we want to control that”

“Sure…”

“And intelligence unhindered like that is much worse. If it’s high enough, then it can do anything.”

“You mean its high variance? Because it can do good things too. Like we solved the spam problem or how terrorism seems to go down with better economic conditions, surely there’s a bias towards better outcomes?”

“The power of the negative outcomes substantially outweigh the positives.”

“I kind of like the idea that people getting smarter is good. God knows education being the panacea is said often enough that it might even be true.”

“It’s not just education, that’s between people, these are literal aliens.”

“And just in case you’re saying these systems get superintelligent without us knowing and you are worried that they will end up hurting us accidentally?”

“Yes, because in the set of ideas which intelligence can be deployed towards, the number of them that are human-compatible are vanishingly small.”

“That’s the blind watchmaker problem. It’s a fallacy,” I said.

“Blind watchmaker? How?”

“Because we don’t simply choose a random section of the “probability space” as you say. We iteratively hill climb towards a maxima, much as evolution did, and we find our way towards a solution where the maximal fitness is actually useful to us, since we’re the ones designing it.”

“But this won’t be the case once we’re out of the loop.”

“Huh?”

“Once it starts recursively improving itself, like it’s already starting to with AutoGPT and so on, we’re out of the loop.”

“For one thing those are extremely brittle systems that don’t exactly recursively self-improve. If anything they make themselves slightly fitter for any particular task, but its hardly the same as making themselves like Dr. Manhattan.”

“We’ve already seen the models get emergent knowledge. They can pass the bar! Not that lawyers are the height of human knowledge, but that feels significant.”

“True, it’s impressive, and we still don’t know if that’s a fluke of training data or a local minima.”

“Should we take that chance?”

“By that metric we wouldn’t have taken any advance ever in history. Surely that shows its a specious argument.”

“Situations differ. And in this case the possibility of negative outcomes is so extreme that we have to be exceedingly cautious. The invisible graveyard ought to remain invisible.”

“That feels a rather large assumption.”

“A safe one though right?”

“Umm.”

Eric Brown

May 2, 2023

Consider the Asilomar Conference on Recombinant DNA back in 1975. At the time, Recombinant DNA was in roughly the position that AGI is today; lots of possibilities, not a lot of knowledge about the actual capabilities.

One of the key agreements was an understanding of what sorts of research could occur where; in particular, what sort of biosafety protections were needed for what sort of recombinant DNA research. (And, indeed, what sorts of recombinant DNA research should be done at all.)

I believe that something like the Asilomar Conference should be held today about AI (and AGI).

Expand full comment

2 replies by Rohit Krishnan and others

Cosmo

May 10, 2023

Individual humans may not behave as blind watchmakers, but humanity as a whole? I'd argue that even our cultural and technological history looks a lot more like evolution than intelligent design. Sure, our exploration is largely guided by our interests, but the discoveries we make are almost never what we expect.

That said, it's highly probable we'll see AGI development go the way of genetic engineering and nuclear bombs—tightly regulated and far more difficult than anyone can predict. Of course, that's not to say the risk is zero, but as with genetics and nuclear physics, the risk may be worth the reward, and we might just be lucky enough to manage it.

13 more comments...

Strange Loop Canon

Discussion about this post