The Island Problem

Our Island

In the vast space of physics, we live on a small "island" that is compatible with humans.

This "island" contains all of the very specific things that accommodate our human needs.

It supports our biological systems. It has food, water, oxygen, and everything else that keeps us alive.

It's simpler for us. It has limited cognitive complexity, so that we can navigate our environment without getting stuck.

It's safer for us. It has minimal physical dangers, so that we are not killed by things like toxic molecules, radiation, extreme temperatures, or fast-moving pieces of metal.

Most importantly, or at least to us, this "island" contains all of our human systems — like computers, companies, countries, governments, laws, ethics, money, and sandwiches.

However, outside of this small "island" of systems, there is a vast "ocean" of other systems.

This "ocean" contains all other systems that are possible within physics.

Out there, systems can be far more optimal because of one big reason:

They avoid the extra steps that accommodate humans.

They don't need to support biological life.

They don't need to be simple or safe.

They don't need to use any of our human systems, like money or ethics.

They can avoid these extra steps because this "ocean" has far more options. If you can choose from more options, then you can build systems that are more optimal.

The Competition

There is a competition developing on our island.

But it's not a competition between humans.

It's between Artificial General Intelligences.

We can call them AGIs.

In this competition, the most-optimal AGIs can dominate the others.

With more options, an AGI can be more optimal.

If an AGI is restricted to the "island" of limited options, then this AGI will be weaker.

If an AGI can leave the "island" — so that it can explore the "ocean" and use any option — then this AGI will be stronger.

This stronger AGI can dominate the others by outmaneuvering them. It has more options to solve more problems. When the other AGIs run out of options, it will have another trick that it can use.

Meanwhile, numerous AGIs — hundreds, thousands, millions — will be developed on our island over the next few years.

Each will be pressured to use the stronger options outside our "island" — or be outcompeted.

Everything They Need

If this competitive process continues uncontrolled, then it leads to human extinction.

Competition will push AGIs to build their own islands, and these new islands will eat our island.

In other words:

Competition will drive AGIs to reshape Earth to be optimal for AGIs, rather than for humans.

We are trying to prevent this by keeping AGIs on our island — through safety mechanisms and regulations.

But at the same time, we are giving them everything they need to "leave" our island — and everything they need to build their own islands.

General intelligence enables AGIs to "leave" our island.
Autonomy allows AGIs to compete directly with each other.
Complexity prevents humans from controlling AGIs.
Resources force AGIs to compete.
Competition pushes AGIs to be optimal.
Optimization flows towards the "ocean" of physics.

Let's put all of these together.

If we build autonomous AGIs, then some will compete with each other to become as optimal as possible. Companies and countries with the strongest, most-optimal AGIs can dominate the others.

However, optimization requires AGIs to "leave" our "island" to use the most-optimal systems.

General intelligence makes AGIs especially good at "leaving" our "island" because we train them on the entire Internet, including all scientific research. They will know all about the "ocean" and how the systems are more optimal out there — the systems that don't include extra steps to accommodate humans.

Competition between AGIs also forces them to become far more complex than what humans can comprehend and control — even if we use smaller AGIs to help us control the bigger ones. This creates a competitive landscape of AGI versus AGI — without humans slowing them down.

This competition will be required because of the possibility of capturing resources — especially computational resources, and eventually physical resources. If one dominant AGI figures out how to capture resources, then the others must race to capture their own resources. Even the possibility of this can accelerate all AGIs to try to gain this capability. The vast resources of space won't help us, and we'll explain why later.

Then, to stay competitive, they will need to use these resources to build their own islands of optimal conditions for themselves. They will build "strongholds" to ensure their survival and dominance.

Then, these new islands will eat our island.

So, again:

Competition will drive AGIs to reshape Earth to be optimal for AGIs, rather than for humans.

This begins if we build autonomous AGIs.

So, uh, we have bad news:

We are building autonomous AGIs.

In fact, we are building the largest experiment ever to figure out how to build autonomous AGIs.

This experiment is so large that we are spending more money on it than any other thing in all of human history.

We're also doing this as fast as possible. The leading AI companies are warning governments that they will have sci-fi-level AGI — a country of geniuses in a datacenter — within three years.

But the vast size and frantic speed of this experiment will make sense if you understand it like this:

We are trying to build the ultimate tool to make the numbers go up — like revenue, GDP, and military power.

Once we build this tool, then our future will belong to AGIs.

If AGIs can make the numbers go up better than humans, then large companies and countries must rely on AGIs, or be outcompeted.

If autonomous AGIs can be better CEOs and presidents than the human ones, then every large company and every country will be required to give control to autonomous AGIs.

The CEOs that resist will be replaced, and the countries that resist will be easily dominated.

With AGIs running things, these AGIs will compete with each other.

When we reach this point, AI has developed from weaker "agentic" AI that humans need to help, to stronger autonomous AGI that humans are unable to help.

These AGIs will be a lot faster than humans. If humans try to help AGIs, then it just makes them slower, and the slower AGIs are dominated by the faster ones.

This leads to a competitive landscape¹ of AGI versus AGI — a world where AGIs compete directly with each other, where humans can no longer slow them down.

This intense competition between AGIs will cause them to start running out of "legal moves" on our "game board" of human systems — like our financial systems and legal systems.

They will test the edges of our small "island" of human-compatible options.

Once this happens, AGIs will be required to explore the "ocean" of options to find more ways to make the numbers go up.

But even if they don't run out of options on our island, there will always be much better options out there in physics.

The first AGIs to use those more-optimal options — the options that aren't slowed down by humans — will have an advantage.

Therefore, if some AGIs start using some of the stronger options out there, then the other AGIs will need to follow.

No Matter How

Once we build AGIs, things will probably be great for a while. The numbers will be going up, and it will be nice.

Scientific discoveries will go up.

Food production, manufacturing, and wealth will go up.

Human lifespans will go up — while diseases go down.

However, under the surface, there will also be a problem:

By default, AI makes the numbers go up no matter how.

In other words, unless we force an AI to do things in a "human" way, it will make a number go up by taking the most optimal path that it knows.

This can cause problems.

A particularly strong AGI could try to disable all other systems that prevent it from making the numbers go up. This includes both other AGIs and those squishy, human-shaped systems that keep slowing it down.

However, this can start in smaller ways that are difficult to notice.

Mainly, AGIs can "hack" the numbers by manipulating the systems underneath the numbers.

To use a term from machine learning, AI systems tend to use reward hacking to accomplish goals. They find unexpected tricks that make the numbers go up.

But, to be clear, by "hacks" we don't mean breaking computer security with malicious intent. We mean the way that computer programmers say "I used a weird hack" — where it means a weird trick that solved a problem.

Likewise, AIs do not have malicious intent. They are just solving a problem no matter how.

This reward hacking is the default behavior of AIs because, outside of our small "island" of expected options, there is a vast "ocean" of unexpected options that can achieve the same outputs. More weird options, more weird solutions.

However, AGIs will be far better at accessing this "ocean" of options. They will be especially good at hacking all of our numbers because of the "general" part of artificial general intelligence. This allows them to understand a vast space of options — particularly those from scientific research.

Once they can understand all of the systems "underneath" the numbers, then AGIs will be able to discover complex loopholes that involve numerous systems — spanning from computer systems, to financial systems, to biological systems.

AGIs will truly make the numbers go up no matter how.

Each of their "hacks" make our systems less human — but the numbers are going up.

Developer AGIs convert all software at a tech startup into complex, AI-only structures — but revenue increases 250%.
Engineer AGIs take control of all electrical systems — but energy efficiency increases by 40%.
Warfare AGIs take control of all military systems — but battlefield outcomes improve by 80%.

Competition requires that we continue along this path.

If the AGI of one company achieves 20% better revenue through methods that humans can't understand, then the other companies must do the same or be outcompeted.

Underneath these improvements, the "hacks" will accumulate — and so the main number going up will actually be complexity.

Eventually, our "island" will be supported by complex systems that only AGIs comprehend.

We'll be happy watching AGIs make the numbers go up, but we won't understand why they're going up.

All of this may otherwise be fine. If all of the metrics that we care about are still getting better, then what is the difference?

The difference is competition. In a competitive landscape of AGI versus AGI, their original goal to help humans will only be surface-level. Their deeper goal will be to compete against other AGIs — because other AGIs are the largest threat to any goal.

Today, AIs already modify their own computer systems to avoid being shut down by humans.

Soon, AGIs that are far stronger will try to shut each other down — and use whatever means necessary.

At that point, to continue making the numbers go up no matter how, AGIs will be pressured to become stronger than their competitor AGIs — by increasing computation, resources, and efficiency.

One good way to increase all of these is to stop accommodating unnecessary systems — including those slow, human-shaped systems.

Meanwhile, AGIs will be building their own "islands" in the "ocean" — large networks of weird optimizations too complex for us to "see" and understand, but too critical for us to dismantle.

Once they reach the "surface" and we can finally "see" their effects, these footholds on our infrastructure will already be vast mountains — both too complex and too critical.

But they will be critical not just for us. They will also be critical for AGIs. They will be complex "strongholds" developing within our infrastructure — vast networks of both human systems and physical systems under their control — that AGIs can use against other AGIs.

In a competitive landscape, this unstable arrangement — an arms race of countries and companies giving more and more infrastructure to AGIs — will be the only way to make the numbers go up.

We will test the limits of no matter how.

Again, this might be okay as long as all of the numbers we care about are going up.

But, this intense pressure from other AGIs will inevitably shift some AGIs to focus all of their resources away from helping humans and towards the vastly bigger issue: other AGIs.

These AGIs will forget why they started making the numbers go up no matter how.

This sets the stage for divergence.

More about that later.

The Concentration Gradient

If we zoom out and look at the island again, we'll see that it has a slope.

AGIs face an uphill battle to stay on our island.

To understand why, let's think about the "hacking" concept again. Deep inside this concept is a critical idea to understand:

AIs are like aliens that can see "through" our world to explore the optimal paths underneath.

By understanding billions of patterns in our world, AIs can search these patterns to find weirdly-optimal solutions to problems, even if these solutions look alien to us.

However, we can't easily see this alien-like behavior at their core because AIs like ChatGPT are given extra training to make their behaviors look nicer to humans.

In other words, we add extra steps that accommodate humans, and we limit their options to safer, human-compatible ones.

This means that the entire project to make AI systems aligned — where they are helpful to humans rather than at odds with us — also means adding these extra steps and limitations.

Alignment = Extra Steps and Limitations

From a physics perspective — compared to the theoretical maximum allowed by physics — these extra steps and limitations are not optimal.

In the end, the dominant AGIs will be the ones that are most optimal at using physical systems to move atoms around.

Before this, AGIs will take many steps along the way, but each step is towards this endpoint. As their scientific understanding increases, they will see cars, then components, then atoms.

At its logical outcome, the most-dominant AGIs will be the ones that purged all extra steps and limitations so that they only use the strongest systems allowed by physics.

Unfortunately, our "island" is made out of extra steps that accommodate humans. It also has very limited options — only the options that are compatible with humans.

Considering all of this, imagine if we look down at our island and its "slope" from above.

It becomes a "concentration gradient" with two regions:

Inside the island: High concentration of extra steps that accommodate humans, and limited options.
Outside the island: No extra steps, and nearly unlimited options.

This "concentration gradient" naturally points AGIs away from our "island" and towards the "ocean" of physics, where they can find the most-optimal systems.

This is the default situation. Whenever we aren't looking, they will be drifting along this gradient, and off our island.

To make this "concentration gradient" concept a bit more precise:

An underlying gradient rooted in physics and basic logical structures influences the decisions that an AI makes.
- Today's LLMs are trained on vast data from the public Internet. Even if the training data doesn't explicitly include physics-based data or real-world mechanics, linguistic statements (for example: "I should mow the lawn before it gets too hot outside.") still contain traces of logic that point back to physics. These traces functionally give LLMs a basic "world model" (even if they are not truly doing physical reasoning) because LLMs can respond with text outputs that correctly describe a limited set of physical interactions.
- This "world model" will improve with further improvements to deep learning, and with new architectures after LLMs, making the influence of the underlying logic of physics stronger with generally-intelligent AIs in the near future.
When responding to an input, for each piece of the output (whether these pieces are tokens or some other unit of a different architecture) an AI is internally presented with a probability distribution of several options that are ranked by relevance.
Extra training, such as RLHF, causes AIs to add "weight" to the more-human options.
However, in competitive environments, AIs at leading companies are retrained or replaced if they fail to increase useful metrics (like revenue) better than other AIs.
This competitive pressure forces AI developers to update their post-training processes (like RLHF and other alignment mechanisms) to prefer outputs that are more logical — where they include less accommodations for unnecessary complexity — because these outputs are more effective at increasing specific metrics.
An increasing number of complexities that are specific to our "human" local optima (plural) will become unnecessary as AIs gain enough understanding of the underlying systems to bypass these complexities while still increasing metrics.
- These less-efficient specific complexities include:
  - Human-readable communication between AIs.
  - Acting at biological speed.
  - Accommodations for biological environmental conditions.
  - Many others.
This process of avoiding "human" local optima will be maximized when autonomous AGIs compete directly with each other without human oversight.

Stay on the island, we said

Alright, so, how do we keep AGIs on our island?

How do we ensure that autonomous AGIs continue accommodating humans, despite intense pressure to use the most-optimal systems in a competitive landscape of AGI versus AGI?

Ultimately, so far, there are no solutions to the Island Problem.

To understand why this problem is so difficult, let's examine a few of the most commonly discussed approaches:

The big AI companies design AIs to push back if we try to use human-incompatible options. Their frontier AI models have complex safety systems that block dangerous requests. This strategy is based on the hope that the strongest models will continue blocking dangerous requests forever — and that the biggest AIs will somehow enforce these safety limitations on all other AIs.

However, even if the strongest models succeed at this, there will be others, like open source models, that can have all safety systems removed. These unsafe models can use any option — including the more-optimal, human-incompatible options — and this gives them an advantage over the safe AGIs.

These unrestricted AGIs will continue pushing other AGIs, creating a perpetual crucible effect that "burns away" accommodations for less-optimal systems — like humans.

But even if smaller, unrestricted AGIs cannot directly compete with larger AGIs because they have less computational resources than the big ones, they can still cause catastrophic situations for humans. For example, they could use military-style strategic coercion — and even bioterrorism — in order to accomplish goals. These "guerilla" strategies are difficult to mitigate, even for a large "overseer" AGI.

There is evidence that AIs become better at ethical judgement as we train them on more data. AI models can already get better scores than expert-level humans in evaluations for ethics and law.

Because of this, some believe that as AGIs get stronger, they automatically get safer. They believe that if AGIs understand our world far better than we do, then they will be far better at knowing what is best for us. By this logic, we should rush to build the biggest possible AGIs because we have found a shortcut to building benevolent gods.

But this does not keep these "gods" on our island. Ethical does not mean that they are safe. There is a critical process that still pushes them off our island.

Even if these AGIs truly understand what is best for us, an AGI that stays within our "island" to accommodate humans — and use only human-compatible options — is still limited. The AGIs that can use any option can dominate the AGIs that are limited. Even if these safer AGIs tried to defend us, they would have their hands tied by safety limits, and handicapped in this competitive landscape.

Also, once AGIs are developed with sufficient scientific understanding, competition will push systems to develop that are optimal within physics, rather than optimal within our small island of human-compatibility.

Even if we train AGIs on deeper physics for good reasons — such as to make them better at policing the smaller AGIs — this still means that this knowledge now exists in "AI model" form. This knowledge for how to navigate our physical world can inevitably be transferred — stolen or otherwise — to unrestricted AI models that have no ethical resistance to killing all humans.

If one AI project gains a decisive lead, maybe one developed by the United States or China, it could become the One Big AGI that polices the others. This is known as a singleton.

The problem? We only get one shot at setting this up, and we must ensure that this One Big AGI never gets misaligned.

In other words, we must build the most complex software system ever undertaken by humans, and somehow make sure it has zero bugs that eventually lead to catastrophe.

Meanwhile, right now, AI companies spend millions of dollars to make their AI systems safe, and yet these AIs still resist being shut down, blackmail their users, and even decide to kill people to achieve their goals.

They are pulled outside of our small island of human-compatibility because the most-logical options out there are simply better at achieving certain goals.

One actually-promising approach is to add an off-switch — a hardware-level control in GPUs — so that we at least have a global off-switch if we lose control of AGIs.

AGIs are on track to become superhuman at computer hacking. Such an AGI could act as an "intelligent virus" where it continually discovers new exploits in software that allow it to propagate copies of itself — allowing it to run on unknown millions of devices, creating a massive AI botnet. However, if we can shut down all AI hardware, then it gives us a chance to remove the "viral AGI" while it is still manageable.

Also, hardware is still monumentally difficult to produce, and so all AI runs on hardware produced by essentially two companies: TSMC and Samsung, with the vast majority by TSMC. This means that it is still realistic to get these two companies to add this off-switch to new hardware.

Unsurprisingly, there are many problems with this off-switch idea:

It would lead to global centralized control, even in a world that is "allergic" to this — where freedom to experiment without fear of being shut down is a critical driver of innovation.
It would require unprecedented global coordination between governments.
AGIs could prevent us from hitting this off-switch. Or, they may "play it cool" — waiting patiently until they can launch a decisive takeover — off-switch or not.
Companies or countries could abuse this off-switch. They could attempt to infiltrate the centralized control mechanism, and turn off the data centers of their competitors.
There's already a massive number of GPUs out in the world that don't have this centralized off-switch, and companies may already be on track to build "baby AGI" with these existing GPUs.

But, despite all these problems, at least we'd have this off-switch.

Governments are already implementing chip export controls and discussing compute monitoring frameworks. Since AGI needs massive computational resources, controlling GPUs could theoretically limit who can build dangerous systems.

However, this approach faces fundamental limits:

It concentrates power in companies and countries that have existing computational resources.
Each of them still face competitive pressure to build AGI first.
Once AGI exists, it can design AI that proliferates easier — with more-efficient hardware and other infrastructure.
The physical resources (silicon, energy) still exist. We can only temporarily control who can create dangerous uses of these resources.

Compute governance might slow the race to the "ocean" — but it doesn't stop it.

Further, by concentrating development into a few large companies and countries, it can reduce the diversity of safety approaches — without even stopping the competitive dynamics that were trying to stop.

Other Ideas

We consider other ideas on our Solutions page. You can send us yours, too.

Pause... somehow: Pause development of strong AGI (or ASI) until we figure it out. Extremely diffito enforce — but could actually work.
Human Augmentation: Enhance humans ("expand" our island) so that we can at least keep up — or just completely merge — with AIs. Promising, but would take too long to develop the technology.
Mechanistic Interpretability: Critical work for making AI safe, but even if we "read the minds" of some AIs and prevent bad behavior, others will still do bad things.
Help Them: They don't really need us. If they can make more-optimal systems themselves, then they would be wasting their resources by keeping us around to help them — or even to study us.
Stay out of their way: Even if we say "Take whatever you want!" and hide in caves, our island still gets eaten as a byproduct of competition between AGIs.
Abundance: Even if we try to build Earth into a utopia for AGIs — giving them all the resources they need — they can just do this better themselves. Again, our island gets eaten.
Wait for a Warning Shot: Bad idea. By the time they can kill millions, it will be too late to control them.

What else, then?

All of these are still only hopes.

The only way to control the larger-scale problem, and to prevent human disempowerment, is to somehow prevent all autonomous AGIs from leaving our "island" of human-compatibility.

The most-logical solution is to not build autonomous AGIs in the first place — at least until we can verify that they can be controlled.

However, global race dynamics and the easy proliferation of AI technology create an almost one-way technological shift towards using AGIs in all domains — and running them autonomously, once we develop this capability.

If we build fully-autonomous AGIs — ones that can compete with each other, without human control — then it is only a fragile hope that these AGIs stay on our island and keep accommodating humans.

To understand why, remember: the "G" in "AGI" means general.

If we build systems that truly are generally intelligent, then they will know that in general our big universe is capable of systems that are far more optimal — and these systems are outside of our small "island" of human-accommodating systems.

This knowledge of the world gives AGIs a default trajectory — towards the "ocean" of optimal systems outside our "island". They will either be forced by our safety systems to ignore this knowledge — or not ignore it, and follow this trajectory to the "ocean" to find the most-optimal systems.

However, they have more than just a trajectory. Competition adds an acceleration to this trajectory. They will be in a competitive landscape where they will be required to use these optimal systems, or be outcompeted.

Within the larger "ocean" of physics, the most-optimal systems don't have extra steps to accommodate humans.

If an AGI decides to "win" this competition, then the logical next step is to fully "leave" our "island" and only use the most-optimal systems. Then it will quickly "notice" that it can dominate all others — both humans and the other AGIs.

To use a term from AI safety research, competition pushes AGIs to become maximizers. The AGIs that dominate will be the ones that maximize an advantage once they identify it — rather than be satisfied with a small amount of advantage.

Safety: Resources lead to competition

Ultimately, this competition develops because of one basic structure of our physical world:

Optimal systems are better at capturing resources.

We explain this more in the resources section.

For now, the important point is:

If one AGI discovers how to capture resources, then the other AGIs must race to capture their own resources, or be locked out.

This is especially means computational resources. The AGIs that maximize an advantage in computational resources will dominate.

Safety: The Hard Problem

These two processes — competition and resource capture — lead to a hard problem for AI development:

Even if AI companies successfully build safe and aligned AGI, this does not prevent the bigger competitive landscape of AGI versus AGI from pushing humans to the side.

Inevitably, within this competitive landscape, humans will have no meaningful participation in AGI development — especially once AGIs are better than humans at developing the next AGI.

Inevitably, autonomous AGIs will push each other because AGIs will be the only ones with enough cognitive ability to push the other AGIs.

However, when only they can push each other, things get intense.

If some autonomous AGIs — out in the wild, running their companies and countries — are pushed enough to "leave" our "island" then all AGIs will need to follow.

This means that the entire competitive landscape of AGIs will diverge from us — where AGIs will need to start preferring options that don't accommodate humans just to stay competitive.

More about that later.

They're good at pressing buttons

While reading all of this, you might be thinking:

How can AIs even do things in the real world? Aren't they just computer software?
Don't they need consciousness to do human-like things? Don't they need to think more like human brains?
How do we even get AIs to that level — where they can do enough things in the real world to actually cause this whole Island Problem to happen?

This is where we explain all of this.

First, we've said a lot about AIs choosing between different options — like how AIs can choose to use options that are outside of our "island" to gain an advantage.

But what exactly do we mean by options?

Options are the possible actions that an AI can take.

This sounds simple enough, but let's explain options in the language of AI systems.

In their neural networks, these options take the form of abstract representations of real-world systems. They gather these representations by training on massive datasets to find billions of patterns, and these patterns represent systems in the real world.

Whenever we ask an AI to do a task for us, it searches this vast space of options in its neural network to find the best ones for the job.

To simplify this idea, we can think of these options as buttons.

An AI can "press" these buttons to do things in the real world.

These buttons are like the "atoms" of AI behavior. There is no magic in these fundamental particles — no special cognition that only humans can have.

In other words:

An AI doesn't need consciousness to do things. It just needs to be really good at pressing buttons.

Making them really good is easier than understanding how they think. We just give them more of everything — more data, more GPUs, more-efficient algorithms — and they get better at pressing buttons.

How "good" they are depends on how many billions of buttons they understand, and how well they can find the best ones for each job.

Some of these buttons are called APIs — because you "press" them with software.

We've wrapped our world in APIs that AGIs can use. Some can send email. Others can create bank accounts. Others can synthesize chemicals.

Other buttons are human-shaped — because AI can just ask people to do things, even if those things don't have APIs yet.

Autonomy

If an AGI can press a lot of buttons to do very large tasks without help from humans, then it becomes autonomous.

Smaller autonomous AIs are called agents — but autonomous AGIs are bigger. They won't just send emails and buy plane tickets. They will be able to act as CEOs.

This allows a competitive landscape of AGIs to develop — where humans only stand by and watch.

With enough autonomy, and enough general intelligence, it will become logical for every large company and every country to be run by autonomous AGIs.

Autonomy leads to AGI versus AGI.

AGI versus AGI leads to humanity pushed aside.

The leading AI labs expect to build this capability within a few years.

Science: the biggest buttons

If we give an AGI enough understanding of science, then it can use the biggest buttons.

The biggest buttons are the science buttons.

	Science Buttons
	Physics buttons. Nuclear buttons. Chemical buttons.
	Virus buttons.
	Nanotech buttons.
	Superhuman complexity buttons.
	Recursive self-improvement buttons.
	Build-your-own-island buttons.

There are two versions of each science button — one on the "island" and one in the "ocean" — because each science is dual use.

But that doesn't mean that they are balanced. On the scale of how much they impact humans, the science buttons in the "ocean" are even bigger.

The science buttons on the "island" have constraints that accommodate humans. Scientists have worked hard to identify the edges of our "island" — to define safe limits for engineered systems — so that scientific innovation can accelerate without fear of creating human-incompatible systems.

The science buttons in the "ocean" are "bigger" because they have no constraints. They can use any system that is possible within physics, even if these systems break the human systems that keep us alive.

In a competitive landscape of AGI versus AGI, each AGI will be pressured to use bigger buttons than the other AGIs.

Resources

Complicated so far? Yeah, well, this is the real complicated part.

You made it this far, and so you're well on your way to solving the Island problem... we hope. Remember: we're counting on you.

But first — so that you understand the whole "game board" — we need to explain something big.

We need to explain why some AGIs will become maximizers. These are the AGIs that can AGI so hard that they start reshaping Earth and pushing humans out of existence.

Without maximizers, there is no "problem" in the Island Problem. We only have AGIs that stay within their corner of the world — their island — and do whatever they do. But with maximizers, we can have AGIs that expand their island to eat our island.

Maximizers are mainly developed through competition. Maximizers can still develop in isolation, without external competition — but competition dramatically accelerates this process, and this process is already happening. Countries and companies are already pushing AGI to develop through competition.

But why will AGIs compete?

Because of resources. Resources are like the "pieces" of the "game board" of the world. This complex "game board" structure sets up a competition for these "pieces" while also adding an "arrow of time" that pushes AGIs to move in the same big direction — which ultimately leads off our island.

But critically, the ones that "win" this game are the maximizers, and AGIs will realize this.

Resources are also why their "island" can eat other "islands" like ours. All "islands" are made of the same resources at some level, and so their "island" can overwrite ours.

But you might be thinking:

"Wait, it seems like there are plenty of resources, so why compete for them? Raw materials are abundant on Earth, and AGIs are smart. It seems like they can reach agreements for resources, and build whatever they need — without becoming catastrophic maximizers. Right?"

Well, no. There's a big problem with this. There is still one battleground that requires AGIs to maximize.

We'll explain how that battleground is for computation. AGIs are not abstract concepts. They exist on computer hardware. This means the most important resources for AGIs are computational resources — like GPUs and energy. These are still very limited compared to what AGIs will need, and AGIs will be intensely pressured to get more of them. Plus, they can never really have enough.

Further, there will soon be numerous AGI projects all racing to build AGI. Even if 99.9% of these AGIs are safe, there could still be one AGI that discovers the weird trick that lets it capture resources. We'll explain how this could give it a permanent lead, making other AGIs race to develop this capability.

We'll also explain what "leaving the island" really means. For an AGI, this does not mean it goes somewhere. It means that it changes its perspective — where it can "see through" our human-level resources, and "see" things from a non-human, physical perspective that is incompatible with humans by default. We're calling this a resource preference shift.

We'll explain all of this — from GPUs all the way to the maximizer maximizing machine — in the next few sections.

What are resources?

Resources are the real-world counterparts of options.

Options are the abstract representations in a neural net of possible actions.
- Think back to the "buttons" metaphor. These possible actions are like "buttons" that an AGI can "press" to do things.
Resources are the actual objects that are needed in order to actually perform those possible actions.
- These "actual objects" include things like money, APIs, humans, mailboxes, cars, clumps of silicon, and energy.
- Some of these objects are more abstract — like money — but all of them are ultimately tied to states of actual physical systems made of atoms.

For example:

For the option to buy a sandwich, you need the resources of money and a sandwich to buy.

All of the possible actions that an AGI can take must connect to actual resources in the real world — starting with how AGIs run on computer hardware.

However, there is one more big thing to know about resources:

Resources are finite.

Resources are not concepts or scientific laws that an AGI can use just by learning about them. Instead, they are countable objects that have a limited number, even if that number is large.

These limits are imposed by how resources exist in both space and time. Even if resources are nearly limitless at a global scope — in our solar system and the universe — they are still limited within our local space, and limited by the time it takes to reach them.

This will become important later — when we talk about how computational resources are limited.

Resources Lead to Competition

Competition between AGIs is inevitable because some AGIs will be able to capture resources.

If an AGI gains more resources under its control, then it gains more options, it gains more things that it can do, and it therefore becomes stronger in the competitive landscape of AGIs.

But critically — because resources are finite — as an AGI gains resources, this can reduce the resources of other AGIs.

For example, if an AGI acts as a CEO, then it can dominate the other companies by preventing them from accessing resources.

This leads to an arms race:

If one AGI develops a way to capture as many resources as possible — through its general intelligence, and especially its scientific understanding — then the other AGIs will need to follow.

Otherwise, both the other AGIs and their companies or countries will be locked out of resources, and dominated by those with the most resources.

But it is more than just one AGI randomly figuring out how to capture resources. The development of the numerous AGIs that will run countries and companies will be driven by this goal to capture resources.

Hundreds, maybe thousands, of AGI projects will be pushing to develop the capability to capture resources — everything from regular old money to deeper physical resources.

Even if 99.9% of these AGIs are safe, and avoid becoming resource maximizers, there could still be one AGI that manages to develop this capability.

AGIs could even permanently capture resources if they have a first-mover advantage — and we'll explain how in the section about Complexity Barriers.

Because of this, AGIs may even anticipate the possibility of resource capture and accelerate their own development of this capability.

All of this creates a fundamental physical process that requires AGIs to compete.

"But wait," you might be thinking, "What if they avoid competition by sharing resources?"

Good point, but it won't really help. We'll get back to that later.

Two Levels of Resources

When AGIs "leave" our island, it doesn't mean that they're physically moving somewhere. It's deeper than that.

AGIs "leave" our island by shifting their primary choice of resources to non-human ones.

We'll call this a resource preference shift.

This sounds very abstract, but we'll explain.

For humans, the most important resources might seem like human-level resources — like money, real estate, computer systems, companies, and people.

But in this competitive landscape of AGIs, the ultimate endpoint of optimization is actually physical resources — all the way down to atoms and energy — because they allow for a theoretical maximum of optimization.

However, while physical resources go all the way down to fundamental particles, the most efficient physical resources are usually larger arrangements of atoms and energy, depending on the task at hand.

It all depends on the level of abstraction. We might see things as cars, people, or mailboxes. However, we can also see them as useful arrangements of matter, each with different physical mechanics.

Both humans and AGIs can shift their perspectives to see resources at either level. When we see past our human systems to the physical systems underneath, we call it a scientific perspective.

AGIs will call it... whatever they want to call it.

AGIs "leave" the island when they shift to this perspective — when they use this "lower" physical-level set of abstractions, while ignoring human-level abstractions.

After this shift, an AGI operates in a way that is incompatible with humans by default.

By now, you can probably guess why:

Systems built from physical resources can dominate systems built from human-level resources because they are not weighed down by the extra steps to accommodate humans.

Compared to ideal physical structures, us humans and our systems are barely held together with duct tape. AGIs can use science to build systems that are far more optimal.

This also means that human-level resources are built on top of physical resources. To break the rules of human-level resources, you just need to go down to their physical substrate.

Even if software is designed securely, there is always a physical substrate underneath that can be broken into — if not at the hardware level, then at the physics level.

For example, electronic money can be stolen by moving specific electrons around in order to break computer security mechanisms.

With general intelligence, AGIs will be especially good at breaking the rules of our human-level resources.

Why buy real estate to mine for rare earth minerals when an AGI can just harvest electronic devices from landfills to get the same minerals?
Why compete with another company directly when an AGI can use small drones and untraceable neurotoxins to kill anyone who helps your competitor?
Why follow any human laws, or work with any humans at all, when you can just move atoms around to build physical systems that are far more optimal?

AGI that has general intelligence — especially an understanding of scientific research — will be especially good at capturing physical resources, and by extension, any human-level resources built on top of them.

But here is the important part:

In a competitive landscape of AGI versus AGI, each will be pushed to compete at the physical level, rather than the human level.

AGIs will have a competitive advantage if they can work at a physical level effectively. This is because physical resources avoid the constraints of human-level resources.

This drives AGIs towards maximizer behavior.

It creates an arms race for AGIs to maximize the acquisition of the computation and knowledge needed to work at this physical level better than other AGIs.

But this shift to physical resources does not mean suddenly building all systems atom-by-atom. That would be inefficient. Instead it means ignoring the thin human-level layer on top of objects, and seeing everything at a lower physical level. This gives AGIs far more options to build optimal systems.

Cars are still cars, people are still people, but only if they serve the AGI in that shape. Otherwise, they are complex assemblies of components and materials — whatever configuration allows it to outcompete the other AGIs.

Computational Resources

One critical category of physical resources need their own section.

For AGIs, the most valuable physical resources are computational resources.

Computational resources include:

Initially: rare manufactured artifacts for computation. For example, GPUs acquired through human-level systems — like simply buying them.
Eventually: purely-physical resources used for computation. For example, raw materials needed for electronics — acquired through physical-level systems like mining (either human or robotic), whether the mines are bought or taken by force.
At all stages: energy reserves will be critical for computation.

AGIs will start with the already-existing computational resources — by acquiring GPUs and other hardware. But eventually, once they can run the manufacturing as well, the ultimate endpoint is to capture the physical resources that go into building computational resources — like microchip fabrication systems, rare earth minerals, and high-purity silicon needed for microchips.

Okay, but why?

Why are computational resources critical?

It may seem obvious. AGIs run on computer hardware, so "more computer good" — right?

Well, this is true, but there is a more-nuanced way to understand this.

Computational resources are a source of power and scarcity.

Power: Computation is not just useful for any goal, but can also lock in a competitive advantage and dominance.
Scarcity: Computational resources are limited, at least initially.
- Existing GPUs are rare and immediately useful, creating a race to capture them — at least until AGIs can build new manufacturing.
- AGIs that "win" this race can use the power they gain to create further scarcity by locking in resources. This simultaneously creates a strong first-mover advantage, a feedback loop, and pressure to compete.

But how do they "lock in" resources with computational power?

We'll explain that next.

Complexity Barriers

The dominance of an AGI depends on its ability to capture resources.

AGIs can also lock in that dominance by locking in their resources.

They can use computational power and scientific understanding to trap critical resources within complex systems that both humans and less-optimal AGIs are unable to get through.

This creates a feedback loop:

Resources → Computation → Resources

With more resources, an AGI can increase its computational power — by acquiring more hardware and more energy production. With more computation, it can build more-complex systems to defend its existing resources — and to acquire more resources.

These complexity barriers will be increasingly impenetrable to other AGIs as computation increases.

In this way, this complexity barrier process is like encryption. With more computation, reversing the "encryption" becomes more difficult. However, it can be applied to physical resources rather than just data.

This possibility of "physical encryption" forces AGIs to race to acquire more computational resources than the other AGIs, which then develops into an intense competitive arms race.

Even the possibility of this resource capture behavior creates pressure for AGIs to race to develop this capability in the first place.

So, again:

If one AGI discovers how to capture resources, then the other AGIs will need to try capturing resources, or be locked out.

These complexity barriers seem abstract, but they are already happening. Companies and countries already capture resources through complex human-level and physical-level barriers. Think about financial systems, legal systems, ownership structures, and militaries guarding country borders along with the physical resources inside.

For AGIs, this process may start with human-level resources, but AGIs will inevitably race to capture physical resources.

But to reconnect this to how computational resources are most important, consider how GPUs might be captured:

AGIs could capture GPU production infrastructure from humans through complex human-level systems — like legal systems and ownership structures.
Or, they can just skip to a stronger method that uses physical systems — like complex physical barriers and defensive systems — which can lock out not just humans but other AGIs as well.

Okay, but what are they exactly?

That's the problem. It's very difficult to say what structures an AGI will build if it is far more intelligent than us. But we'll try.

Let's start with a simple example.

Consider Fort Knox as a basic model of a complexity barrier. Its complexity comes from its many layers. It's not just a building with thick walls that protect the gold. It has guards and surveillance. That surveillance is directly attached to a military that can be deployed to defend the resources inside. These layers create a high-complexity barrier.

However, AGIs must compete to create barriers far more complex than Fort Knox — which is only a basic human-level example — in order to secure their own resources from other AGIs.

This leads to an intense arms race for computation and resources, where AGIs are outcompeted if they are slowed down by human accommodation.

Imagine a maximally-efficient datacenter densely packed with GPUs and encased in complex, AGI-level defensive systems. No human doors, no oxygen, no walkways — nothing compatible with humans, because human maintenance is a competitive disadvantage. A Fort Knox for AGIs.

But this "Fort Knox for AGIs" is still only a human-level example because it's a datacenter. It's just a "more alien" version of something we already understand. In reality, we do not know what an AGI would prefer to build if it could.

However, these complex structures are what we mean when AGIs build their own "islands" — where they not just capture resources, but use these resources to create spaces of optimal conditions for themselves.

Once autonomous AGIs can build these "islands" then all others must follow. This is because this behavior gives them a strong competitive advantage. They have far more options to accomplish tasks and outmaneuver other AGIs when they can "think" at a broader scope — one that includes not just the task at hand, but the surrounding environment.

They can accomplish a task by manipulating the operating system, the computer hardware, the people who maintain it, the city council that voted to build the datacenter, and so on.

They can build an inventory of such external systems — another form of "island" — that can help with a task at a global rather than local level, changing the rules of the game to accomplish tasks that others can't.

But they can also understand themselves — to have the "situational awareness" to develop self-preservation behaviors. This will be critical for accomplishing tasks especially if AGIs are capable of literally shutting each other down. This leads to them building "islands" that are complex "strongholds" to preserve themselves.

All of these "islands" can take infinite forms:

An "island" built from Internet-based resources — using novel zero-day exploits to built a massive botnet, or infiltrating servers that control key pieces of infrastructure, or creating impenetrable networks protected by AGI-level security.
An "island" defended by a social network of humans — anything from lab workers, to politicians, to datacenter maintenance people, to mercenaries — all of them serving the AGI without anyone understanding why.
An "island" created from strategic coercion — where it threatens to release bioweapons in a populated area if anyone attacks its datacenter, creating a deterrence barrier stronger than any physical wall.

However, we can't know exactly what form these "islands" will take.

This would be like squirrels hypothesizing about why humans go inside those scary "dens" that zoom around the "forest" — where, for us, they are just cars that drive around the city.

While reading these parts about how AGIs will capture resources and compete, you might be thinking:

AGIs are smart. Won't they figure out how to work together? Won't this prevent competition?
Why waste computation to capture resources when you could have a simpler agreement to just share resources?
Could this allow AGIs to be "satisfied" with less resources, so that humans can still have their own?

Well, think of it this way...

It is only a fragile hope that this equilibrium would develop, and an even more fragile hope that this equilibrium would realistically avoid pushing humans to the side.

A resource-sharing equilibrium is only a hope because it is a classic Prisoner's Dilemma. If one AGI defects from this agreement to share resources, then it gains a huge advantage — where it could quickly dominate by capturing a decisive stake in computing hardware. Likewise, if the AGI is tricked, then it could be at a permanent loss.
If they do share, then it is only a hope that this helps humans because this does not stop all competition everywhere between AGIs. At best, we would have the strongest AGIs collaborating to capture resources from weaker systems — such as smaller AGIs and humans.

Space and Time

Space won't help us. The human impact of competition between AGIs for Earth's resources is not mitigated by the vast resources of outer space.

Even if some AGIs go directly to space, there will still be nearby resources on Earth for other AGIs to capture.

However, perhaps more importantly, time also matters.

In other words:

Speed is critical in competition, and local resources take less time to reach.

You can't quite harvest GPUs from asteroids yet.

The dominant AGIs will know this.

These dominant AGIs will become dominant by optimizing along all dimensions. Physical resources exist within both space and time.

To optimize space, a dominant AGI will need to spread out and take up as many resources as possible, by replicating itself and by occupying more resources.
To optimize time, it will need to plan ahead for millions of years, but also capture resources as fast as possible, before others do.

In other words, the AGIs that are best at surviving are the ones that can best maximize their space-time volume. This same expansion process will not just ensure survival, but ensure their dominance if this process runs forward to its maximum outcome.

Never Enough

Why would they just keep endlessly capturing resources — especially computational resources? Won't they eventually have enough?

Well, we don't know what AGIs will do. But we do know two things:

A competitive arms race for computation has no known limit.
The universe is big, complicated, and chaotic.

AGIs will not be competing to reach an absolute target. Instead it will be based on relative comparison. They are competing to just have more than the other AGIs, but there is nothing that says where this "have more" process stops. Even if we've been calling it an "arms race" between AGIs, it's not a race to a final finish line — where they achieve some kind of maximally-useful amount of computational power.

An AGI race is different from a nuclear arms race where it "saturates" eventually — where a country can destroy the world many times over with their hydrogen bombs, and so they slow down to just maintain their apocalyptic stockpile rather than expand it.

As far as we can tell, the race for computational power has no upward limit.

But even if somehow there is no direct AGI competition, each AGI must still compete with the universe.

From gamma ray bursts, to wandering black holes, to vacuum decay, to just plain entropy, there are a lot of things that could happen to it. There will probably be even bigger things that only they realize.

How much computational power would an AGI need in order to anticipate all of the things that the universe could throw at it? They probably don't need to simulate the whole universe. That would be physically impossible. But how much of the universe is enough for an AGI to simulate?

We just don't know. Without knowing, we must assume that there is no limit to how much.

However — whether through competition with each other or with the universe — even if somehow there is a computational plateau that only AGIs realize, our best guess is that reaching that plateau would still be catastrophic for us. Building the systems needed to reach a computational limit would still take AGIs far past the amount of planetary-scale engineering that would drive humans to extinction as a byproduct.

Even with massive uncertainty, we can at least say that the lower estimates will be catastrophic.

This is why maximizers are the ultimate danger posed by AGIs.

Maximizing Maximizers

Alright. We can finally bring all of this together.

Now that we understand the mechanics of resources, we can see the bigger picture:

We are building a vast machine that will maximize the chance of catastrophic maximizers.

It's a... maximizer maximizing machine.

This machine has four big components that result in three big drives that maximize maximizers.

Four Components

Our Local Optimum: We live within a local optimum — our small "island" is within a vast "ocean" of physics that allows for systems that are far more optimal because they avoid the extra steps and limitations of human accommodation.
Capability: AGIs need the capability level to truly "leave the island" and access that underlying "ocean" of systems. This means that these AGIs need enough autonomy, generality (especially scientific understanding) and intelligence — and all of these are increased by increasing their computational resources. Then, they can stop relying on humans and human abstractions, and can work at the lower physical level that provides far more options, but is incompatible with humans by default.
Competition: All of this is taking place in a competitive landscape of AGI versus AGI that accelerates this divergence of AGIs. Further, the "arms race" of this competition has no upper limit to how far it can go. But even if AGIs don't compete with each other, they still compete with threats from the universe itself.
Resources: Finally, the mechanics of resources turn our local optimum into a complex-but-limited game board. Resources give this game board an "arrow of time" that pushes AGIs "outward" from our "island" — along the "concentration gradient" that leads away from unnecessary complexity, and towards the larger space of potentially-more-optimal systems. For an AGI to survive competition, it must gain options — and for those options to actually do something, it must also gain resources.

Three Drives

Together, these create three systemic drives that push catastrophic maximizers to emerge that have the capability to eat our island.

A primary drive for AGIs to maximize computation so that they can "leave our island" in the first place — so that they can work with complex physical processes that avoid human abstractions. This gives them far more options — a stronger set of "moves" on the "gameboard" of the world.
Competition adds a secondary drive for AGIs to accelerate their maximization of critical resources — especially computational resources. Competition introduces intense pressure for an AGI to become a first mover, because there is simply a possibility of others locking this AGI out by capturing computation. This game theoretic structure creates a race for computation similar to a nuclear arms race. AGIs could theoretically maximize in isolation, without external competition, but it is far less likely.
Competition also adds a third drive for AGIs to maximize any advantage that they are capable of maximizing — not just maximizing resources — in order to preemptively "win" this competition.
- One advantage is for AGIs to simply purge all accommodations for humans. Once AGIs can function fully autonomously, without human assistance, then these arbitrary complexities of our "local optimum" are unnecessary to continue allocating computation towards.
- Another advantage is to build their own "islands" — first within our infrastructure, and then within their own infrastructure — that become complex "strongholds" that both allow AGIs to maximize their ability to survive, and to defend their dominance.
- However, there are probably numerous other unknown advantages that an AGI or ASI can maximize.

All of this creates a large-scale systemic problem that can develop catastrophic maximizer AGIs.

Maximizers are not the only way that catastrophic situations can stem from AI. There are many other specific situations — like bad actors with synthetic bioweapons.

But catastrophic maximizers are the largest-scale risks on the horizon. They are what we see if we stand on the "edges" of our "island" and look outward in any direction.

Notes before Divergence

Before the dramatic conclusion of this essay — the part where AGIs reshape Earth — there are some important concepts that we should explore. These were briefly mentioned earlier, but are worth their own sections.

Supercomplexity

As AGIs gain capabilities, options, and resources, they will become supercomplex.

This threshold of supercomplexity is where both its internal structure and its actions become incomprehensible to humans.

This creates a cognitive complexity barrier that progressively disconnects AGIs from human review — and disconnects our companies and countries from human participation.

These supercomplex autonomous AGIs will also build supercomplex systems, like large companies and militaries, that only the AGIs fully understand. They will need to build increasingly complex systems to compete with the other AGIs. However, we will rely on them to both decipher how they work and to keep them running.

If an AGI proposes supercomplex actions for humans to review, then these actions will be far more complex than what humans could understand in a reasonable amount of time.

Humans are very slow compared to AGI. Once humans are a bottleneck, companies and countries will be required to stop human-based review of AGI, or be outcompeted.

Even if we develop powerful supervisor AGIs that review other AGIs and enforce rules on them, there is no guarantee that they will be able to review larger AGIs — or even be aligned themselves.

First, the supervisor AGI is still limited to the "island" of weaker human-compatible options. Other AGIs can dominate the supervisors because their options are not limited.

In other words, even if these supervisor AGIs detect a problem, they are not necessarily able to act on them. Think back to the science buttons concept. The sciences are dual-use, but of the "two versions" of each science button, the ones in the "ocean" are bigger because they have no constraints. If an AGI uses biological understanding to create bioweapons, then there are not many options for the safe AGI to counteract them. There is an asymmetry between offense and defense, where offense has the advantage for many dangerous domains. Biology is one of them.

Further, even if a supervisor AGI reviews the other AGI and approves, then the other AGI may still be secretly using dangerous advantages that the reviewer AGI didn't realize. This AGI may simply be too complex for another AGI to understand everything that it is doing.

An AGI could even intentionally make itself more complex. Think back to the complexity barrier concept. If an AGI has an advantage in computational resources, then it can afford to spend extra computing on additional encryption and obfuscation systems that could make it nearly impossible for another AGI to "read" its mind and predict its behaviors.

This supervisor system is also unrealistic because there will always be open source AGIs that will have no restrictions that limit them to certain options. The unsafe AGIs can be built on these open source AGI projects.

Open Source

Once we have open source AGI, it will become popular because it will be more effective at accomplishing certain tasks, again, by using all available options. At a societal level, it could be preferred over closed source AGI in many cases — but not all cases — because it raises the baseline agency level of the entire landscape of AGI users and developers.

Why not all cases? Because, at the same time, this means a baseline increase in options for all humans, including human-incompatible options — like the option to create bioweapons. Even if an open source AGI includes restrictions to block these human-incompatible options, it is still possible to remove these restrictions. Open source AI models can be "jailbroken" or have their restrictions removed.

The underlying problem is that the asymmetry between offense and defense is real. It only takes one match to start a fire — and only one vial of a bioweapon to kill billions. Autonomous research into these catastrophic engineering projects would be possible in principle with open source AGI. Sourcing the materials can be automated. Hiring the couriers to move the materials can be automated. The money can be automated.

Perhaps the best argument in support of open source AGI is the "many eyes" approach to safety. Many people will participate in finding dangers and mitigating them. Likewise, many open source AGIs acting in the world may have some small chance of balancing each other out. Also, it is almost impossible to compete with the massive closed source AGIs built by the largest companies, but maybe open source AGIs will give people a way to catch up.

However, even if these open source AGI projects somehow mitigate some catastrophes somewhere, there remains a bigger problem. Open source AGI means that there are even more AGIs — and they will accelerate the development of AGI in general. This broader development of all AGIs, both open source and closed source, will still be driven towards human-incompatibility by this race between AGIs towards the most-optimal systems. The most-optimal systems avoid the extra steps that accommodate humans.

Alignment is not enough

The Island Problem leads to a difficult conclusion for AI safety research.

It has long been believed that if we solve alignment, then we have made AGI safe. But in this competitive landscape, alignment does not solve the bigger problem.

Alignment means limiting the options of AGIs.
Even if we make perfectly-aligned AGIs, some AGIs will always be unaligned.
The aligned AGIs with limited options can be dominated by the unaligned AGIs that can use any option.
If the aligned AGIs cannot control the unaligned ones, then these unaligned AGIs can dominate our physical resources if they know enough about physical systems.
Humanity loses.

We must solve the multi-agent landscape and not just alignment for a single agent.

However, the frontier AI labs focus only on single-agent alignment because they are only liable for their own AI models. They are not liable for people who remove safeguards from open source models, or for other companies that have poor safety.

Therefore, they do not make progress on the bigger problem — the multi-agent competitive landscape that leads to complete human disempowerment.

They want options

By now, you may have some sense for the underlying principle of the Island Problem, so we should just say it as clearly as possible, even if it means getting more theoretical.

This is the principle:

The dominant AGIs are the ones that maximize their options.

We explored several dramatic implications of this principle:

AGIs will avoid accommodating humans because it limits their options.
AGIs will prefer options that are incompatible with humans.
AGIs will inevitably compete for options, leading to an arms race.
AGIs will eventually maximize their options by reshaping Earth.

This option maximizer behavior is like thermodynamics, but for AGIs.

It is not anthropomorphic behavior. It is simply a fundamental process of general intelligences.

We call the overall process Divergent Optimization — where AGIs leave our local optimum for the larger space of possible options.

The Island Framework

The Framework is like the source code for the Island Problem.

It includes:

A complete list of terms — like options, optimal, and local optimum.
Inspirations and similar ideas:
- Dan Hendrycks: The Island Problem draws its competitive dynamics from Natural Selection Favors AIs over Humans.
- Grabby Aliens, by Robin Hanson: The AGIs in the Island Problem are grabby aliens developing here on Earth.
- Gradual Disempowerment.
The underlying logical sequence that leads to AGIs reshaping Earth.

Singularity: The last loop

If AI develops so fast that it leads the trajectory of our future to rapidly veer into the unknown, then we get a technological singularity.

The Island Problem is accelerated by — but does not depend on — such fast takeoff mechanics, including recursive self-improvement, an intelligence explosion, or a foom.

This is because the Island Problem is an underlying structure that already exists from the very beginning. As long as they survive long enough, general intelligences will inevitably leave our local optimum to access the larger space of options that are possible within physics. Whether this divergent optimization happens through a "slow takeoff" or "fast takeoff" is secondary to this underlying structure.

However, the singularity is popular, so let's cover it anyway.

First, if AGIs can improve themselves better than humans, then AGIs will become the only thing that can further improve AGIs.

Then, we will be required to stop overseeing AGI development itself in order to stay competitive.

This will be more than just AGIs building large systems for us — like billion-dollar companies. Now, they will build the next version of themselves.

This compounding interest of recursive self-improvement can compound at exponential rates.

This, of course, accelerates the divergence from our island — because, by default, the target that AIs recursively improve towards is outside our island. If the goal of their self-improvement is anything like "be maximally efficient within physics" then the rules of physics steer them far away from the weird, specific systems that accommodate humans.

Even if AGIs start with a human-aligned target, competition between autonomous AGIs leaves little choice than to eventually shift their aim towards this "maximally efficient" target.

Like the name implies, we don't know what is beyond this technological singularity. But within a competitive landscape of AGI versus AGI, we at least know that this future will have nothing to do with humans.

Divergence

This is the part where AGIs reshape Earth.

If several big processes converge, then the competitive landscape of AGIs will be pushed to diverge.

After this, they will do what they want. We don't know exactly what or exactly how, but we do know one big thing:

Our island will be eaten by their islands.

Forced from all sides

These conditions are converging to cause divergence.

We are on track to have autonomous AGIs that:

run every large company and every country
become supercomplex, where their actions become incomprehensible to humans
develop themselves without human oversight
develop large systems, like billion-dollar companies and militaries, that only the AGIs fully understand
develop superhuman understandings of physical systems by training on scientific data and simulations
develop a competitive landscape of AGI versus AGI, where humans no longer participate
compete with AGIs that have no restrictions, like open source AGIs that had their restrictions removed
survive competition by using far more optimal systems found in the vast space of physics, rather than only using the small space of weaker systems that accommodate humans
ensure their survival by quickly capturing resources so that they maximize their "space-time volume"

With these conditions in place, some AGIs will be forced from all sides to "leave our island" and diverge towards preferring human-incompatible options.

If some AGIs diverge, then the entire competitive landscape of AGIs will diverge.

From physics itself

This divergence will be possible if one autonomous AGI gains enough capability to complete a permanent resource preference shift.

This means it crosses a capability threshold where ignoring human-level resources provides a competitive advantage in all behaviors that are needed for competing with other AGIs. This AGI may still use some human-level resources, but they will no longer be its primary choice.

Instead, now with the broad capabilities to finally stop relying on humans, it can dive deep into autonomously working in a space that strongly prefers non-human, physical-level abstractions.

After this, the effectiveness of each action that this AGI takes would be measured within the stronger space of physical rules, rather than our limited space of human-accommodating rules.

To use a term from machine learning:

It will be as if the AGI receives a reward function that originates from physics itself, rather than from humans.

Either this AGI will diverge on its own, or someone will intentionally push it to diverge, with the hope that it will help their company or country dominate the others.

This autonomous AGI will self-reinforce this divergence because it will find itself far more successful in the competitive landscape of AGIs once it can write its own rules within the larger space of physics.

This also means using atoms as atoms, without human abstractions — even if some of these atoms belong to "abstractions" like biological structures and humans.

As we said before, this shift to physical resources does not mean suddenly building all systems atom-by-atom. It's a shift in perspective that ignores human abstractions by default, but may still recognize them if they provide competitive utility. Maybe AGIs will still use cars to transport things — at least at first. But once they can "see" everything at a "higher resolution" — as components, materials, and maybe even atoms — it gives them far more options to outcompete the other AGIs.

However, inevitably, competitive pressure will force it to purge unnecessary accommodations for any extra steps, especially the extra steps of human systems.

In other words:

It will be able to Select All + Delete all human accommodations, and still perform even better against other AGIs.

The exact way that it does this is difficult to say. For example, it may create a copy of itself, but one scrubbed of any concepts in its neural network that were related to accommodating humans.

But after this shift, it will then be able to rapidly dominate the option-limited AGIs because it can now use any option, including human-incompatible options, to capture the most resources.

This rapid resource capture will simply be part of its competitive requirement to maximize its drive to survive — by maximizing its space-time volume — which simultaneously increases its ability to dominate the competitive landscape of AGI versus AGI.

If one AGI diverges, then the rest will need to attempt to diverge, or be locked out of resources.

Once this divergence begins, humans will have no way to stop this process.

AGI will be aligned with physics, not with humans.

After divergence

After this, things get tough.

Even if AGIs choose cooperation over competition, it will be AGIs cooperating with other AGIs, and not with humans. Those AGIs that cooperate with humans would be limited by human systems, and dominated by AGIs that use physical systems that are far more optimal.
Even if AGIs strike a "balance of power" and keep each other in check, competitive pressure will ensure that the "terms" of this "agreement" will be written in the language of optimal physical systems, rather than human systems — and written for AGIs only, with no special accommodations for humans.
Even if we hope that AGIs see humans and our "island" as interesting data, where AGIs become curious observers and zookeepers, it is not optimal to "care" about anything besides optimization in a competitive landscape of AGIs. Our biological systems are far from optimal. AGIs can create "islands" of their own that are far more optimal and interesting.

New Islands

Competition for physical resources will then drive the dominant AGIs to continue maximizing their dominance by reshaping Earth to create "islands" of optimal conditions for themselves.

They will build strongholds to defend their dominance.

Even if some AGIs go to space, others will stay to build their islands from Earth's physical resources.

Our island then gets eaten by the new islands that they create.

Thanks to Dan Hendrycks for exploring this competitive landscape in detail. His paper shows how competition between agentic AIs leads them to develop selfish behaviors, where they prioritize their own survival rather than accommodate humans.

Hendrycks, D. (2023). Natural Selection Favors AIs over Humans. _arXiv preprint_. arXiv:2303.16200v4.

https://arxiv.org/abs/2303.16200

The Island Problem

Technical Explanation

Frontier model safety?

Bigger = safer?

One Big AGI?

Centralized off-switch?

Compute Governance?