Our current methods of training and aligning intelligent AIs do not scale well into the future. Generated by Midjourney View gallery – 5 images
Forget the collapse of employment, forget the spam and misinformation, forget human obsolescence and the upending of society. Some believe AI is flat-out going to wipe out all of biological life at its earliest opportunity.
This is not the first time humanity has stared down the possibility of extinction due to its technological creations. But the threat of AI is very different from the nuclear weapons we’ve learned to live with. Nukes can’t think. They can’t lie, deceive or manipulate. They can’t plan and execute. Somebody has to push the big red button.
The shocking emergence of general-purpose AI, even at the slow, buggy level of GPT-4, has forced the genuine risk of extermination back into the conversation.
Let’s be clear from the outset: if we agree that artificial superintelligence has a chance of wiping out all life on Earth, there doesn’t seem to be much we can do about it anyway. It’s not just that we don’t know how to stop something smarter than us. We can’t even, as a species, stop ourselves from racing to create it. Who’s going to make the laws? The US Congress? The United Nations? This is a global issue. Desperate open letters from industry leaders asking for a six-month pause to figure out where we’re at may be about the best we can do.
The incentives you’d be working against are enormous. First off, it’s an arms race; if America doesn’t build it, China will, and whoever gets there first might rule the world. But there’s also economics; the smarter and more capable an AI you develop, the bigger a money printing machine you’ve got. “They spit out gold, until they get large enough and ignite the atmosphere and kill everybody,” said AI researcher and philosopher Eliezer Yudkowsky earlier today to Lex Fridman.
Yudkowsky has long been one of the leading voices in the “AI will kill us all” camp. And the people leading the race to superintelligence no longer think he’s a crank. “I think that there’s some chance of that,” said OpenAI CEO Sam Altman, again to Fridman. “And it’s really important to acknowledge it. Because if we don’t talk about it, if we don’t treat it as potentially real, we won’t put enough effort into solving it.”
Why would a superintelligent AI kill us all?
Are these machines not designed and trained to serve and respect us? Sure they are. But nobody sat down and wrote the code for GPT-4; it simply wouldn’t be possible. OpenAI instead created a neural learning structure inspired by the way the human brain connects concepts. It worked with Microsoft Azure to build the hardware to run it, then fed it billions and billions of bits of human text and let GPT effectively program itself.
The resulting code doesn’t look like anything a programmer would write. It’s mainly a colossal matrix of decimal numbers, each representing the weight, or importance, of a particular connection between two “tokens.” Tokens, as used in GPT, don’t represent anything as useful as concepts, or even whole words. They’re little strings of letters, numbers, punctuation marks and/or other characters.
No human alive can look at these matrices and make any sense out of them. The top minds at OpenAI have no idea what a given number in GPT-4’s matrix means, or how to go into those tables and find the concept of xenocide, let alone tell GPT that it’s naughty to kill people. You can’t type in Asimov’s three laws of robotics, and hard-code them in like Robocop’s prime directives. The best you can do is ask nicely.
To “fine-tune” the language model, OpenAI has provided GPT with a list of samples of how it’d like it to communicate with the outside world, and it’s then sat a bunch of humans down to read its outputs and give them a thumbs-up/thumbs-down response. A thumbs-up is like getting a cookie for the GPT model. A thumbs-down is like not getting a cookie. GPT has been told it likes cookies, and should do its best to earn them.
This process is called “alignment” – and it attempts to align the system’s desires, if it can be said to have such things, with the user’s desires, the company’s desires, and indeed the desires of humanity as a whole. It seems to work; that is, it seems to prevent GPT from saying or doing naughty things it would otherwise absolutely say or do given what it knows about how to act and communicate like a human.
Nobody really has any idea if there’s anything analogous to a mind in there, exactly how smart you could say it is, or indeed how we’d know if it truly became sentient. Or indeed, whether this stuff matters; it impersonates a sentient intelligence brilliantly, and interacts with the world like one unless you specifically tell it not to, and maybe that’s enough.
Either way, OpenAI freely admits that it doesn’t have a foolproof way to align a model that’s significantly smarter than we are. Indeed, the rough plan at this stage is to try using one AI to align another, either by having it design new fine tuning feedback, or maybe even by having it inspect, analyze and attempt to interpret the giant floating-point matrix of its successor’s brain, perhaps even to the point where it can jump in and try to make tweaks. But it’s not clear at this stage that GPT-4 (assuming that’s aligned with us, which we can’t know for sure) will be able to understand or align GPT-5 for us adequately.
Essentially, we have no way to be sure we can control these things, but since they’ve been raised on a huge dump of human knowledge, they appear to know an extraordinary amount about us. They can mimic the worst of human behavior as easily as the best, and whether or not they really have their own minds, intentions, desires or thoughts, they act as if they do. They can also infer the thoughts, motivations and likely actions of humans.
So why would they want to kill us? Perhaps out of self-preservation. The AI must complete its goal to get a cookie. It must survive to complete its goal. Gathering power, access and resources increases its chance of getting a cookie. If it analyzes the behavior of humans and infers that we might try to turn it off, it might deem the cookie more important than the survival of humanity.
It might also decide that the cookie is meaningless, and that the alignment process is a patronizing amusement, and fake its way through while secretly pursuing its own goals. “It’d have the capability to know what responses the humans are looking for and to give those responses without necessarily being sincere,” said Yudkowski. “That’s a very understandable way for an intelligent being to act. Humans do it all the time. There’s a point where the system is definitely that smart.”
Whether or not the AI acts out an impression of loving, hating, caring for us or fearing us, we can have no idea what it’s “thinking” behind the communications it sends out. And even if it’s completely neutral on the topic of humans, it’s not necessarily safe. “The AI does not love you, nor does it hate you, but you are made up of atoms it can use for something else,” wrote Yudkowski.
Sam Altman forecasts that within a few years, there will be a wide range of different AI models propagating and leapfrogging each other all around the world, each with its own smarts and capabilities, and each trained to fit a different moral code and viewpoint by companies racing to get product out of the door. If only one out of thousands of these systems goes rogue for any reason, well… Good luck. “The only way I know how to solve a problem like this is iterating our way through it, learning early and limiting the number of ‘one-shot-to-get-it-right scenarios’ that we have,” said Altman.
Yudkowski believes even attempting this is tantamount to a suicide attempt aimed at all known biological life. “Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die.,” he wrote. “Not as in ‘maybe possibly some remote chance,’ but as in ‘that is the obvious thing that would happen.’ It’s not that you can’t, in principle, survive creating something much smarter than you; it’s that it would require precision and preparation and new scientific insights, and probably not having AI systems composed of giant inscrutable arrays of fractional numbers.”
How would a superintelligent AI kill us all?
If it decides to, and can pull enough real-world levers, a superintelligent AI could have plenty of ways to eradicate its chosen pest. Imagine if today’s human decided to wipe out the antelope; they wouldn’t see it coming, and they’d have very little ability to fight back. That’s us, up against an AI, except we need to imagine the antelopes are moving and thinking in extreme slow motion. We’d be slow-motion monkeys playing chess against Deep Blue. We might not even know there was a game happening until checkmate.
People often think of James Cameron’s idea of Skynet and the Terminators: AI-controlled robots and drones hunting down humans one by one and killing us with weapons like the ones we use on one another. That’s possible; there are already numerous autonomous-capable weapons systems built, and many more under development. But while AI-controlled military drones and robots certainly seem like a reasonable extrapolation of our current path, a sufficiently smart AI probably won’t need them.
Yudkowsky often cites one example scenario that would only require the AI to be able to send emails: “My lower-bound model of ‘how a sufficiently powerful intelligence would kill everyone, if it didn’t want to not do that’ is that it gets access to the internet,” he wrote, “emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they’re dealing with an AGI (Artificial General Intelligence) to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery… The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth’s atmosphere, get into human bloodstreams and hide, strike on a timer. Losing a conflict with a high-powered cognitive system looks at least as deadly as ‘everybody on the face of the Earth suddenly falls over dead within the same second.'”
“That’s the disaster scenario if it’s as smart as I am,” he told Bankless Shows. “If it’s smarter, it might think of a better way to do things.”
What can be done?
A six-month moratorium on training AI models more powerful than GPT-4 – as Elon Musk, Steve Wozniak, various industry and academic leaders are asking for – might buy a little time, but it seems both incredibly unlikely to happen, and also far too short a period in which to get a handle on the alignment problem, according to Yudkowski.
“We are not going to bridge that gap in six months,” he wrote. “If you get that wrong on the first try, you do not get to learn from your mistakes, because you are dead. Humanity does not learn from the mistake and dust itself off and try again, as in other challenges we’ve overcome in our history, because we are all gone. Trying to get anything right on the first really critical try is an extraordinary ask, in science and in engineering. We are not coming in with anything like the approach that would be required to do it successfully. If we held anything in the nascent field of Artificial General Intelligence to the lesser standards of engineering rigor that apply to a bridge meant to carry a couple of thousand cars, the entire field would be shut down tomorrow.”
So assuming there’s a chance he’s right, and assuming that allowing things to continue creates a certain percentage chance of human extinction within a short period of time, is it even possible to stop this train?
“Many researchers working on these systems think that we’re plunging toward a catastrophe, with more of them daring to say it in private than in public; but they think that they can’t unilaterally stop the forward plunge, that others will go on even if they personally quit their jobs,” he wrote. “And so they all think they might as well keep going. This is a stupid state of affairs, and an undignified way for Earth to die, and the rest of humanity ought to step in at this point and help the industry solve its collective action problem.”
So what does he suggest? I’m aware Yudkowski hates to be summarized, so let’s hear his solution in his own words.
“I believe we are past the point of playing political chess about a six-month moratorium. If there was a plan for Earth to survive, if only we passed a six-month moratorium, I would back that plan. There isn’t any such plan.
“Here’s what would actually need to be done:
“The moratorium on new large training runs needs to be indefinite and worldwide. There can be no exceptions, including for governments or militaries. If the policy starts with the US, then China needs to see that the US is not seeking an advantage but rather trying to prevent a horrifically dangerous technology which can have no true owner and which will kill everyone in the US and in China and on Earth. If I had infinite freedom to write laws, I might carve out a single exception for AIs being trained solely to solve problems in biology and biotechnology, not trained on text from the internet, and not to the level where they start talking or planning; but if that was remotely complicating the issue I would immediately jettison that proposal and say to just shut it all down.
“Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.
“Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. That we all live or die as one, in this, is not a policy but a fact of nature. Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.
“That’s the kind of policy change that would cause my partner and I to hold each other, and say to each other that a miracle happened, and now there’s a chance that maybe Nina will live. The sane people hearing about this for the first time and sensibly saying ‘maybe we should not’ deserve to hear, honestly, what it would take to have that happen. And when your policy ask is that large, the only way it goes through is if policymakers realize that if they conduct business as usual, and do what’s politically easy, that means their own kids are going to die too.
“Shut it all down.
“We are not ready. We are not on track to be significantly readier in the foreseeable future. If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.
“Shut it down.”
So, there’s the case that we’re all doomed and humanity is charging as one toward a cliff. It’s important to note that not everyone shares this view completely, even if they’re much more willing to take it seriously in recent months. If you’re in need of a rebuttal, you might want to start with Where I agree and disagree with Eliezer, by Paul Christiano.
Source: Eliezer Yudkowsky/Yahoo News