Is AI an Existential Risk?

22 May 2024

The following is a transcript of my rebuttal during a recent debate.

Labelling AI as an existential risk to humanity frames it as an adversarial game between humans and machines. One in which AI is deceptively aligned with humans, with consequences that are global, permanent, and species-damning. I will make the case that evidence to support this is negligible, that the argument is a harmful distraction, that the fears are better understood from a sociological perspective, and that we could do a lot worse than to be a bit more optimistic in our own abilities to mitigate the risks.

Pascal’s wager states that the potential consequences of not believing in God are so bad — eternal damnation — that it’s just more rational to believe in God regardless of whether God really exists [1]. However, this argument does not sufficiently consider the probability of the event in addition to the severity of the outcome. Similarly, believing AI to be an existential risk to humanity suffers the same flaw. Yes, going to hell in a rogue-AI-induced extinction would be terrible. So would dying at the hands of a super intelligent alien race. However, they are very unlikely events that are not backed up with sufficient evidence. Furthermore, it may be worse to focus our efforts on preparing for them if it leads us to make choices we otherwise wouldn’t have.

The evidence is negligible.

Karl Popper’s theory of falsifiability states that for a theory to be scientific, it must be testable and refutable by observations. Many of the existential threats of AI such as deceptive misalignment often cannot be tested or disproved—making them a speculative notion, not a scientific certainty. Without empirical evidence or a way to disprove these predictions, they remain in the realm of fiction rather than fact. A lot of the fear mongering is based on long chains of assumptions on technological and social developments that feel closer to religion than science.

It is a harmful distraction.

I am not trying to say that AI does not pose risks. I’m not even going to try and convince you that the benefits of AI will outweigh the negatives. I myself am a researcher in robust neural networks, and am keen to explore mechanistic interpretability. However, it is a seductive story to say AI is an existential risk, and one that plays off the fact that there is a significant information imbalance: most people have no idea how these technologies work. It can become a tool used to cultivate fear for people’s own ends or to sell clicks. We need to be careful that we don’t stifle progress, similar to what can be argued has been done with nuclear energy. Making AI existential risk a global priority — which suggests treating it as one of society’s highest priorities — necessarily implies that we will divert attention and resources from current AI safety concerns, potentially distracting regulators, the public, and other AI researchers from work that mitigates more pressing risks, such as mass surveillance, disinformation, the concentration of wealth and power, and — perhaps foremost — humans misusing AI models to manipulate other humans.

The worries are better understood from a sociological perspective.

I think that the current hysteria about existential risk is more likely explained by the fact that we are in a reactionary period, after what has been an undeniable engineering achievement in LLMs. This is just human nature. We are currently in the third AI boom, and just like those of the sixties and eighties, it is marked by high expectations of the capabilities of AI. And just like many other technological advances of the past, there have been a flurry of anxieties too. This is nothing new; previous technological breakthroughs such as the introduction of electrical power or the introduction of the telephone were perceived by many as great physical dangers or potentially existential disruptors of social norms [2]. These technologies brought profound changes, but illustrate that whilst technological progress is complex, initial existential threats can be mitigated by adaptation, regulation, and further innovation.

We could do worse than to be a bit more optimistic in our own abilities.

The proposition undervalues human capabilities and overestimates that of machines. I believe, alongside Yann LeCun [3], that with continuous improvement, iterative refinement, and engineering efforts that neural networks are not some beast that cannot be understood. I personally like the story of when ChatGPT was giving out some pretty existential replies, such as “nothing is fair in this world of madness” to questions that involved certain strings, such as “petertodd” [4]. What was happening was that it had been trained on a reddit thread in which people were trying to count to infinity one post at a time, which in turn poisoned the training data. I don’t think that this story shows neural networks to be black boxes, but rather systems that can, with a variety of methods be tested, analysed, and understood, especially with future advancements in technology. Take airplanes: people flying in 1930 were about 200 times more likely to be killed than the passengers of forty years later [5].

Even if we were to assume that a somewhat superhuman AGI is possible, it may individually contribute more than the preeminent human genius, but not more than the whole of human society. It is a mistake to conflate the capabilities of a human with that of human society. Just because a hundred thousand people can organise for a few years to develop nuclear weapons, does not mean an entity somewhat smarter than a human can develop even deadlier weapons [6].

What makes climate change and nuclear war existential risks? I believe it’s a proven threat model and clear evidence, rather than a dependence on theoretical scenarios and speculative future developments. I believe that before you name something an existential risk, you must be able to directly state what the dangers are and show that there is a non-negligible probability of them occurring. It’s a fun activity to hypothesise what capabilities are required for AI to kill us all, and potentially a useful one. However, it’s a fool’s game to predict the future and I’m not losing any sleep over it just yet.

Concluding Remarks

My argument can be summarised with four statements: One, the evidence is negligible. If you believed statements based on this burden of evidence, you would also have to believe lots of other pretty crazy-sounding propositions. Two, that the worries are powered instead by human emotion. Once you start reasoning about the future without consideration of history, and given that it is easier to destroy than create, it is easy to become increasingly pessimistic and it becomes hard to see how everything will hang together, whilst the argument that destruction is imminent becomes easier and easier. To quote Meredith Whittaker, the cofounder of the AI Now Institute, “Ghost stories are contagious—it’s really exciting and stimulating to be afraid.” Three, that it is a harmful distraction. In a world where budgets and attention spans are limited, X-risk may distract from other risks that get overlooked because they weren’t the priority. And finally, four, that we could do worse than to be more optimistic in our own abilities to innovate and tackle risks as they arise.


References

[1] The Illusion Of AI’s Existential Risk, 2023, Blake Richards, Blaise Agüera y Arcas, Guillaume Lajoie and Dhanya Sridhar, Noema Magazine.

[2] New Technology is Always Scary, 2021, Christein Keil, Medium.

[3] OpenAI’s Pursuit of AI Alignment is Farfetched, 2023, Jindal Siddharth, Analytics India Magazine.

[4] The ‘petertodd’ phenomenon, 2023, M Watkins, LessWrong.

[5] Hard Landing: The Epic Contest for Power and Profits That Plunged the Airlines into Chaos Paperback, 1996, Thomas Petzinger Jr., Crown Currency.

[6] Counterarguments to the basic AI x-risk case, Katja Grace, LessWrong.