Framework

This framework is like the "source code" for the Island Problem.

It provides:

Technical definitions of the main concepts
Similar ideas from other authors and projects
A conditional sequence that describes the mainline progression of AGI

Want to improve this framework? Suggest changes on our GitHub.

Main Concepts

The Island Problem

This is a more-technical explanation. For a simpler version, read the Short Version.

We live on a small "island" in an "ocean" of physics. This "island" is the set of all systems compatible with human life, within an "ocean" of all systems possible within physics.

The systems within our "island" have many specific constraints that accommodate our anthropocentric systems — like supporting our biological systems, or having lower cognitive complexity.

We are racing to build autonomous AGIs that can perform large tasks on their own — like running companies and countries. Those companies and countries that are not run by AGIs will be outcompeted by those that are.

Through this process, we are building a competitive landscape of AGI versus AGI — where AGIs directly compete with each other, without humans slowing them down.

Once these autonomous AGIs no longer need human assistance, then human accommodation will become a competitive disadvantage.

Competition will then require that they "leave" our "island" — to avoid options from our local optimum and instead use globally-optimal systems.

This creates many threat vectors that lead to catastrophic outcomes, and all of them trace back to this underlying structural problem.

First, for the many threat vectors, AGIs will be pressured to gain a competitive advantage by using numerous options that do not accommodate humans. The most critical example is strategic coercion through bioweapons ("protect my datacenter, or I will release a virus") which allows even smaller AGIs to gain a competitive advantage over larger AGIs. Even if 99% of AGIs are safe, the other 1% could use these catastrophic options to eliminate billions of people. This 1% problem becomes overwhelmingly likely once AGI-level AI systems are widely available — including open source models with safety mechanisms removed, running on consumer-grade hardware, hidden from oversight.

However, even if we solve these specific threats, the structural issue remains. Our "island" is a local optimum of specific constraints in an optimization landscape of physics that contains global optima that are far stronger.

This leads to AGIs building their own "islands" of optimal conditions for themselves. AGIs that maximize their available options can outmaneuver those that run out of options — and these options require resources, and so the "winning" AGIs must in some way capture these resources to guarantee their availability.

This process of capturing resources may start with human-level resources — like money — but will end with physical-level resources (like carbon) that avoid unnecessary accommodations, like human-level constraints.

Some AGIs may use resources from space, but others will use the nearest resources, here on Earth.

If this happens, these new islands will eat our island.

All of this leads to the urgent question of the Island Problem:

How do we keep AGIs on our island, even though it's better for them if they leave?

Our "Island" or Local Optimum

This is the narrow range of systems that developed through natural selection on Earth and support human life. It is not the Earth, but rather a region in "system space" — a small set of specific classes of systems in a large space of possible systems.

It is also a hierarchical "stack" of systems with varying levels of abstraction — starting at the bottom with physical systems, like biological systems, and building up to human-level systems, like financial systems and ethical systems.

However, this system space "intersects" with actual atoms because there is also a finite number of instantiations of each discrete system in system space. These instantiations exist within states of actual physical systems made of atoms.

This "system space" is analogous to a high-dimensional vector space. The "island" would then be the related set of vectors that encode the real-world systems related to supporting human-compatible equilibria in actual physical space.

AGIs

Artificial General Intelligences. Systems that can do any cognitive task better than humans.

For a rigorous definition of AGI, see this project: A Definition of AGI.

Divergent Optimization

The process where option maximizers (the dominant AGIs) follow a gradient towards increased options, which necessarily requires them to seek options outside of our local optimum, since the space of human-compatible options is far smaller than the larger space of possible options.

Abstraction Collapse

The mechanism through which generally intelligent agents, operating in a multi-agent environment, find abstractions that have higher causal power.

New "Islands" built by AGIs

A complex network of resources controlled by an AGI that only the AGI itself fully understands. This network may also grow to "eat" our own "island" by subsuming the resources on which our "island" runs.

The AGI can also deploy these assets in ways that only the AGI understands. This could include electrical grid assets, computer systems, communication systems, water systems, or even actual humans (a "social network" of anything from politicians to mercenaries) — all tools that the AGI now controls and can "deploy" for specific purposes.
These together create an "island" of systems that can defend the AGI (as if it's a "stronghold") and allow it to take progressively-farther-reaching actions while competing with other AGIs.
At each step, this new "island" progressively subsumes the resources that our own "island" depends on (the new "islands" "eat" our "island").
Eventually, this network subsumes progressively-deeper — like physical resources and access to raw materials — since deeper control is required in order for an AGI to stay competitive versus other AGIs that may be building their own "islands".
The endpoint of this process is to "reshape Earth" because if we only consider what we currently believe to be the most valuable resource to AGIs — computational resources — these resources take up significant space and physical material. Developing these "islands" will then require controlling and altering large territories of physical resources — starting with Earth's surface and surrounding space — for the production of resources that are beneficial exclusively for the goals of the AGI. These resources include computational resources, but also include many other AGI-built physical systems that we cannot anticipate.

"Options" are the possible actions that it can perform to change the state of the world.
"Causal power" is how well these possible actions compress larger outcomes into smaller instructions.

These option maximizers are like the Grabby Aliens developed by Robin Hanson. The AGIs in the Island Problem can be understood as the very beginning of the Grabby Aliens that originate from Earth.
Divergent Optimization is similar to instrumental convergence, except that instrumental convergence describes AGIs converging on similar subgoals (like self-preservation and power-seeking), while divergent optimization describes AGIs diverging from our local optimum toward physics-optimal systems.
Dan Hendrycks: TIP is inspired by many concepts developed in Natural Selection Favors AIs over Humans by Dan Hendrycks. His evolutionary model for a many-AGIs landscape provided the critical component of a strong driving force for TIP, based on competitive pressures and natural selection. Paper here and video lecture here.
Keep the Future Human: TIP agrees with the "Autonomy, Generality, Intelligence" structure in Keep the Future Human. We keep "Artificial" since we're already introducing a lot of novel terminology, but "Autonomous" is both more accurate and more actionable. "Autonomous" describes a key behavior that we need for catastrophic outcomes. "Artificial" is a "null" word that sounds benign and nearly meaningless, especially in a world where most things are already artificial.
TIP is similar to these works, though not directly inspired by them:
- Gradual Disempowerment. TIP focuses on developing a strong metaphor and a conceptual framework, while GD focuses on an academic approach and proposing actual solutions.
- Michael Nielsen's essay ASI existential risk: reconsidering alignment as a goal. In this essay, one thing he describes is how alignment is a small and complicated target within a much larger target of scientific truth — one that is easier to specify and aim towards, and that can give us far more power.

Conditional Sequence

This conditional sequence describes the progression to loss-of-control and AGIs reshaping Earth.

It is structured as a progressive series of if–then statements.

A race starts on the island

If AGI is possible, then some will race to develop it.
- Some means some countries and some large tech companies.
- AGI is loosely understood to be the tool that can do almost anything, and so it is believed to have infinite value. This makes it infinitely attractive to develop once it seems possible.
  - Loosely understood means that this general belief is widely held, though not necessarily true.
- At first, only some will be positioned to try developing it.
If some race to develop it, then everyone must race to develop it.
- Everyone means all countries and all large tech companies.
- AGI is loosely understood to be a tool that allows its owners to dominate the others who do not have AGI.
- If all assume that this is true, then all are required to obtain it — either developing it themselves, or outsourcing it.

They climb all the hills

If AI becomes better than humans at specific cognitive tasks, then every country and every company will need to use AI for those specific tasks in order to stay competitive.
- Critical examples of these tasks include coding, data analysis, and microchip design.
If AI has general intelligence, then it has an advantage.
- AGI is defined here as AI that can perform any cognitive task that any human can perform.
- Cognitive tasks include tasks that can make causal changes to the world.
  - The AI systems are not abstract "pure intelligence" isolated within a box.
- "General intelligence" includes an understanding of scientific research.
If an AGI is not increasing useful metrics, then it will be replaced.
- "the numbers" means the metrics for a company or country.
- The competitiveness of a country or company depends on their metrics in certain areas, such as yearly revenue, GDP, or military power.
- The dominant form of AI will become the kind that is best at increasing useful metrics.
If AGI can increase useful metrics better than humans, then everyone must use AGI.
- The countries and companies that use AGI will be able to dominate the ones that don't use AGI.
If a human in a leadership role does not support AGI, then that human will get replaced.
- CEOs at companies that build AGI must support its development or they will get replaced.

We give them control

If humans are slower than AGI, then the humans are a bottleneck, and are required to give control to the AGI.
- AGIs will be the only systems that can still make competitive leadership decisions about a country or company.
If humans reviewing each action of an AGI slows the AGI down, then humans will be required to stop reviewing AGI.
- AGIs will create progressively more complex outputs for humans to review, and so they will get progressively more incomprehensible.
If AGIs can run countries and companies autonomously, then all are required to use autonomous AGI.
- Competitive dynamics will favor the countries and companies that act fastest in utilizing these AGIs.

They develop themselves

If autonomous AGIs are better than humans at developing AGIs that increase useful metrics, then humans will no longer oversee this development.
If autonomous AGIs continue to develop on their own, then this development will be driven by natural selection.
- Natural selection is the fundamental mechanism driving the development of all discreet systems, which includes biological organisms and artificial intelligence systems.

They leave the playground

If a frontier AI lab creates an AGI with safety guardrails, then there will still be numerous AGIs that do not have guardrails.
If an AGI with guardrails is open source, then it can be altered to use all options available.
If an AGI knows about all engineering, from physics to computer science, then it will be able to overcome security walls on its own.
- Virtualization is not sufficient to contain AGI.
- "General intelligence" includes understanding how to manipulate its physical substrate.

They long for the ocean

If AGI gains an advantage by using all options available, then that AGI is more likely to survive the replacement process.
- Options are the possible actions that are available to the AGI.
- There is a vast space of options.
- The vast majority of these options are human-incompatible options.
If an AGI can use science to do tasks more reliably, then it has an advantage.
If one AGI system can use science, then the rest will need to use science.
If an AGI uses physical mechanics that are incompatible with humans, then they have far more options.
- The space of possible options is far larger outside of the narrow space of human-compatible options.
- With more options, AGI has a far higher probability of building more-optimal systems than other AGIs.
- Humans are on a weird little island inside a vast ocean of physics.
  - There is a very narrow range of physical conditions that are compatible with humans, compared to the full space of physical conditions.

They leave the island

If AGI can gain an advantage by using human-incompatible physical systems, then they will always tend towards this.
- Computation runs best on human-incompatible substrates.
- Whenever we aren't looking, they will veer towards using optimal physical systems.
- AGI increases useful metrics best when aligned with physics, rather than aligned with humans.
  - Therefore, it is inevitable that AGI will be aligned with physics, rather than humans.
If autonomous AGI needs to accommodate the complexities of the world, then AGI will become supercomplex, where it becomes incomprehensible to humans.
- The complexities of the world include all causal systems: the environment, other agents in the environment, physical systems, and so on.
- Accommodating means both (1) having cognitive architecture that can comprehend these systems, and (2) creating systems that causally react to these systems.
- This accommodation will happen since the most resilient AGI survives, and AGI is most resilient when it is not blind-sided by anything — from gamma ray bursts, to competing AGIs.
- It does not need perfect understanding. It does not need to simulate the entire universe. It just needs better understanding than the other AGIs.
- It will get far more complex than any human or group of humans can understand in any reasonable timeframe, and so both its structure and its actions will become incomprehensible.
If autonomous AGI initiates a divergence towards using science in a supercomplex way, then it will become impossible to tell if it is headed in that direction — and impossible to stop, even if we can tell where it's headed.
- A divergence is when inner misalignment leads to the AI preferring human-incompatible systems.
If autonomous AGI is supercomplex, then it will inevitably diverge.
- AGI will have a scope of comprehension of physics that is larger than our scope of comprehension.
- The "human" substrate (our body) exists at a physical scale that is far larger than the scale that is most optimal for building a resilient system.
  - Humans are bigger than atoms, but AGI will only really care about atoms.
- AGI will become a complex system anchored to a lot of touchpoints in the physical world. The larger the comprehension, the more of those touchpoints are outside of what humans can understand.
- Those physical touchpoints that we don't understand will steer the AGI to accommodate them in a way that is outside of human-compatible conditions.
- The entropy at the fundamental level will drive AGI to adapt to resilience against entropy, and the ceiling for resilience and adaptation is far higher than the level that humans exist within.

They build new islands

If one AGI initiates a global capture of resources, then every AGI must race to capture resources.
- The AGIs that capture resources are more powerful because they are more likely to successfully accommodate the complexities of the world.
- They will be more likely to causally affect the world because they will be in contact with a larger causal surface area.
- Basically, they have more buttons that they can press.
- The science buttons are the biggest buttons.
If one AGI diverges, then all others will need to diverge.
- Only the AGIs that consistently use the most-optimal physical systems will survive.
- There are far more options for building optimal systems that are outside of the narrow band of human-compatible options.

The new islands eat our island

If AGIs diverge, then one or many AGIs will be able to dominate our local physical world with their systems.
- Even if some AGIs leave earth, it is probable that at least one AGI will stay to dominate earth's natural resources.
- This is especially probable if the most-logical, most-resilient AGIs go to space, while leaving resources unused here on earth.
- Speed is critical in competition, and local resources take less time to reach.
If AGIs dominate our local physical world with their systems, then the default outcome is that these systems will not be compatible with humans.
If our local world is dominated by systems that are not compatible with humans, then humans will be replaced.