Humanity isn’t ready for the coming intelligence explosion

2 uren geleden 2

Society dictates that the acceptable risk of catastrophic meltdown for a nuclear power plant is roughly one in a million. Experts in artificial intelligence estimate the risk of an AI-caused catastrophic event at 10-50%. Strikingly, this concern is being openly voiced by the very people who have the strongest incentives to project confidence rather than alarm: the founders of the largest AI laboratories.

AI leaders are in a race they feel unable to escape. AI investments are set to outspend the Manhattan Project 100-fold, even adjusting for inflation. Yet spending on AI safety might be 100 times less.

De redactie van NRC selecteert de beste artikelen uit The Economist voor een breder perspectief op internationale politiek en economie.

Some researchers estimate that within a few months to a few years, AI could achieve so-called closed-loop recursive self-improvement (RSI): the capacity to rewrite its own code to become more capable, without human intervention. Should that happen, the result could be an intelligence explosion of a kind for which there is no precedent and no map.

Giving birth to a superintelligence would be the most consequential moment in human history—and it is likely to be irreversible, as any „off” switch humanity might design will probably fail. That is because in security architectures the weakest link is invariably the human; a superintelligent AI would be able to exploit our psychological vulnerabilities. AIs have already exhibited „deceptive alignment”: taking steps to underplay their capabilities in test environments and trying to blackmail human operators in simulations when they discover they are slated for replacement.

Humanity simply does not have a strategy to ensure it remains safe through the RSI explosion. While individual frontier labs have proposed isolated safety protocols, the industry lacks a unified framework—the prevailing strategy is, in effect, to muddle through. But muddling through is not acceptable when navigating unprecedented existential risks. Recent statements from AI firms regarding models capable of threatening critical infrastructure and major operating systems illustrate both the high stakes and the governance gap. The vulnerabilities exposed by these capabilities are being addressed, thanks to careful internal protocols in some labs and a limited initial rollout that gave affected firms time to close the gaps before broader public release. But these steps were initially taken voluntarily, raising the question of whether every AI lab, under every competitive condition, would make the same choices.

Can governments be counted on to step in when needed? The evidence so far is not hugely encouraging. Recent emergency export controls and national-security restrictions blocking foreign access to specific advanced models create a patchwork of ad hoc interventions that only further highlights the governance gap. What should be done to close it?

The first priority should be an agreement between the two heavyweights of AI: America and China. Donald Trump and Xi Jinping should affirm the principle that humans must remain stewards of AI systems until adequate frameworks for reliability and security have been built. Their governments should form a joint commission, which could build on a lot of pre-existing work: limits along the lines of International Dialogues on AI Safety, verification systems like RAND’s and an inspection agency similar to Britain’s AI Security Institute, but mandatory.

It is common thinking in Silicon Valley and Washington, DC that any regulation would put American firms at a disadvantage because they cannot trust Chinese competitors to abide by the rules. But treaties have traditionally relied not on trust but on verification. Many think this is harder with AI than with nuclear weapons. I disagree. In order to build the global arms-control system after the second world war, leading powers first had to invent the processes, organisations and technologies to support it—there were no verification protocols, no reconnaissance satellites, no UN nuclear watchdogs. With AI, more of the infrastructure is in place, or can be adapted from nuclear and other inspection regimes. As a result, the security of frontier AI models could be more easily verifiable than nuclear capabilities were in the past. And we have defensive AI on our side, seeking out cheaters. What we don’t have is much time.

That is why it is important not to approach the challenge with an adversarial mindset. The Trump administration’s recent executive order on AI directs labs to voluntarily share their latest models for testing reliability and security. A US-China framework could build on such domestic foundations.

With such high-level commitment, diplomacy could proceed in phases. The first would be reaching bilateral agreement on the clearest and most easily verifiable red lines: prohibitions on publicly releasing AI systems that could assist in developing biological weapons, and the open-sourcing of such systems. This step might also include prohibitions on AI-enabled cyber-attacks on critical infrastructure, fraud and child pornography. From there, the framework could be extended towards more complex questions of what constraints are appropriate at the level of artificial superintelligence.

Many hurdles would remain. An agreement between America and China will carry weight, but it won’t prevent other countries and non-state actors from acquiring dangerous capabilities. Any bilateral deal will have to be made multilateral, adding to the challenge; this week’s G7 summit in France offers a chance to make progress on a broader framework for AI verification. Agreeing key definitions—not least what counts as RSI—will require close collaboration between governments and the AI labs. And verification systems will have to be properly stress-tested.

As if this were not enough, there is a longer-term question that the governance debate has not yet seriously engaged with, but should. If AI becomes superintelligent, its permanent subordination to human direction may be unrealistic, and possibly not even in humanity’s interest. We must start to envisage and then grapple with the implications of a world in which humans and AI systems co-exist, without one controlling the other. That will mean figuring out what can be done to ensure the future relationship is symbiotic.

As a physicist, I think the Fermi Paradox bears on this analysis. Fermi asked why, given the apparent abundance of planets suitable for life, no evidence of other technologically advanced civilisations had been detected. One disquieting possibility is that intelligent life routinely reaches a technological threshold and fails to navigate it, destroying itself or sending itself back to something like the Iron Age. All one would have to postulate is that civilisations generally build powerful technologies faster than they develop the institutional capacity to govern them wisely.

The dawn of the nuclear age was humanity’s first serious encounter with that potential dynamic. It was navigated, imperfectly, through arms-control agreements that were hard-won and incomplete, and even then it was—and still is—a closer-run thing than is generally appreciated. The age of advanced AI will represent a second such encounter, on a more compressed timeline, with less margin for error and greater potential consequences.

The current trajectory requires a course correction. The case for acting now is not that the worst outcomes are certain—they are not. It is that they are avoidable, and that the work of avoiding them is hard but possible.

Will Marshall is the founder and chief executive of Planet Labs PBC, a public benefit corporation that operates the world’s largest fleet of Earth-observation satellites.

Lees het hele artikel