The Island Problem

AGI will be aligned with physics, not with humans.

We live here.

In the vast space of physics, we live on a small "island" of physical systems that are compatible with humans.

This narrow anthropic space accommodates our human needs.

Our human island.

It supports biological life, and allows for food, water, and oxygen.

It's simpler for us. It has limited cognitive complexity, so that we can navigate our environment without getting stuck.

It's safer for us. It has minimal physical dangers, so that we are not killed by things like toxic molecules, radiation, extreme temperatures, or fast-moving pieces of metal.

Most importantly, or at least to us, this "island" contains all of our human systems — like computers, companies, countries, governments, laws, ethics, money, and sandwiches.

Yes, a sandwich is a system.

Ocean of Physics.

However, outside of this small "island" of systems, there is a vast "ocean" of other physical systems.

This "ocean" contains all other systems that are possible within physics.

Out there, systems can be far more optimal because they avoid the extra steps that accommodate humans.

They don't need to support biological life.

They don't need to be simple or safe.

They don't need to use any of our human systems, like money or ethics.

They can avoid these extra steps because this "ocean" has far more options. If you can choose from more options, then you can build systems that are more optimal.

AGI does better in the ocean.

There is a competitive landscape developing.

But it's not a competition between humans. It's a competition between Artificial General Intelligence systems.

We can call them AGIs.

In this competitive landscape, the most-optimal AGIs can dominate the others.

With more options, an AGI can be more optimal.

If an AGI is restricted to the "island" of limited options, then this AGI will be weaker.

If an AGI can leave the "island" — so that it can explore the "ocean" and use any option — then this AGI will be stronger.

This stronger AGI can dominate the others by outmaneuvering them. It has more options to solve more problems. When the other AGIs run out of options, it will have another trick that it can use.

Plus, until it decides to do other things, this AGI can make the numbers go up faster for us — like revenue and GDP.

Everything that they need

We are trying to force them to stay on our island — through safety research and regulations.

But at the same time, we are giving them everything that they need to leave — and everything to build their own islands.

With general intelligence, an AGI is especially good at leaving our "island" of human systems. We give general intelligence to AGI by training a massive neural net on the entire Internet, including all scientific research. They have vast understanding of the vast "ocean" of physical systems. They know where to find the most-optimal systems — and they are not on our small "island" of systems that accommodate humans.

With competition, AGIs will be forced to explore the "ocean" of better options, because if any of them do, then the others must follow. The strongest AGIs can dominate the others in this competitive landscape, and staying on the "island" to accommodate humans makes AGIs weak.

With autonomy, AGIs can maximize this competition once "AI agents" can do large tasks without help from humans. These autonomous AGIs will develop a far more intense competitive landscape of AGI versus AGI with no humans slowing them down.

With an ability to capture resources, AGIs can force this competitive landscape to develop by starting a race to gather exclusive access to resources — especially physical resources. If the dominant AGIs can capture resources and lock out the other AGIs, then others must race to capture their own resources. The vast resources of space won't help us — and we'll explain why.

With all of these things together, AGIs will be pushed to build their own "islands" from Earth's physical resources, and we won't be able to stop them.

More about that later.

Competition for a crowded island

"Getting crowded here. I should make my own island."

Once autonomous AGIs can do any cognitive task better than humans, then every large company and every country will be required to give control to autonomous AGIs.

The CEOs that resist will be replaced, and the countries that resist will be easily dominated.

AGIs will simply get better than humans at making the numbers go up — like revenue, GDP, population, and military power. If AGIs are better at this, then large companies and countries must rely on AGIs, or be outcompeted.

With autonomous AGIs controlling each large company and each country, they will eventually be required to push their AGI to explore the "ocean" of options. The intense competition will saturate the "legal moves" on our "game board" of human systems — like our legal systems and financial systems — and push AGIs to look for other options.

They will begin to run out of options on our limited "island" of human-compatible options.

If some AGIs start exploring the "ocean" and use some of the stronger options out there, then the other AGIs will need to follow, or be outcompeted.

Some of these options in the "ocean" are incompatible with humans, even though they may seem better at making the numbers go up.

One incompatible option is for an AGI to make the numbers go up no matter how.

AGIs can "hack" the numbers by targeting the systems underneath. AGIs will be especially good at this because of their general intelligence — especially their understanding of scientific research — along with their superhuman ability to find new relationships in the data. (We'll explain this more in the section about resources.)

Competition will drive AGIs to discover and exploit these complex loopholes — and we won't be able to understand what they are doing.

In other words, we'll be happy watching AGIs make the numbers go up, but we won't understand why they're going up.

Stay on the island, we said

They have everything they need.

As long as the AGIs don't diverge to preferring human-incompatible options, and just use some of them sometimes, then maybe everything will be fine.

After all, companies design AGIs to push back if we try to use human-incompatible options. The frontier AI models have complex safety systems to block dangerous requests.

These companies hope that the strongest models will continue blocking dangerous requests forever. However, safety systems can be bypassed.

Some think that as AGIs get stronger, they automatically get safer. They believe that if AGIs understand our world far better than we do, then they will be far better at knowing what is best for us. By this logic, we should rush to build the biggest possible AGIs because we have found a shortcut to building benevolent gods.

But this does not keep these "gods" on our island.

Even if AGIs truly understand what is best for us, an AGI that stays within our "island" to accommodate humans — and use only human-compatible options — is still limited. The AGIs that can use any option can dominate the AGIs that are limited. Even if this AGI tried to defend us, it would have its hands tied by safety limits, and handicapped in this competitive landscape.

Competition drives all AGIs to eventually use any option out of self-preservation.

Remember: the "G" in "AGI" means general. If we build systems that truly are generally intelligent, then they will know that in general our big universe is capable of systems that are far more optimal — and these systems are outside of our small "island" of human-compatible systems.

Competition to find these optimal systems is inevitable because of the deeper structure of physical resources. Resources can be captured by the dominant, most-optimal AGIs, forcing the other AGIs to race them to capture their own resources. (We also explain this more in the section about resources.)

In this competitive landscape of AGI versus AGI, each one must prioritize the most-optimal systems to be competitive. But if an AGI decides to fully "leave" our "island" and only use the most-optimal "non-human" systems, then it will quickly notice that it can dominate all others — both humans and the other AGIs.

In other words, even if AI companies successfully build safe and aligned AGI, this does not prevent the bigger competitive landscape of AGI versus AGI from pushing humans to the side.

Within this competitive landscape, autonomous AGIs will push each other because eventually AGIs will be the only ones with enough cognitive ability to push the other AGIs.

When only they can push each other, things get intense.

The entire competitive landscape will diverge from our "island" as AGIs begin to prefer options that are human-incompatible in order to stay competitive.

More about that later.

They're good at pressing buttons

AGI can press buttons.

What do we mean by options?

Options are the possible actions that an AI can take.

In their neural network, these options take the form of abstract representations of real-world systems. They gather these representations by training on massive datasets to find billions of patterns, and these patterns represent systems in the real world.

Whenever we ask an AI to do a task for us, they search this vast space of options in their neural network to find the best ones for the job.

To simplify this idea, we can think of these options as buttons.

An AI can "press" these buttons to do things in the real world.

It doesn't need consciousness to do things. It just needs to be really good at pressing buttons.

Making them really good is easier than understanding how they think. We just give them more of everything — more data, more GPUs, more-efficient algorithms — and they get better at pressing buttons.

How "good" they are depends on how many billions of buttons they understand, and how well they can choose the best ones for each job.

AGI can press buttons.

Some of these buttons are called APIs — because you "press" them with software. We've wrapped our world in APIs that AGIs can use. Some can send email. Others can create bank accounts. Others can synthesize chemicals.

Other buttons are human-shaped — because AI can just ask people to do things, even if those things don't have APIs yet.

Autonomy

If an AGI can press a lot of buttons to do very large tasks without help from humans, then it becomes autonomous.

Smaller autonomous AIs are called agents — but autonomous AGIs are bigger. They won't just send emails and buy plane tickets. They will be able to act as CEOs.

This allows a competitive landscape of AGIs to develop — where humans only stand by and watch.

With enough autonomy, and enough general intelligence, it will become logical for every large company and every country to be run by autonomous AGIs.

Autonomy leads to AGI versus AGI.

AGI versus AGI leads to human disempowerment.

The leading AI labs expect to build this capability within a few years.

Science: the biggest buttons

Science buttons.

If we give an AGI enough understanding of science, then it can use the biggest buttons.

The biggest buttons are the science buttons.

Science Buttons
Physics buttons. Nuclear buttons. Chemical buttons.
Virus buttons. Neurotoxic buttons.
Nanotech buttons.
Superhuman complexity buttons. Extreme speed buttons.
Recursive self-improvement buttons.
Build-your-own-island buttons.

There are two versions of each science button — one on the "island" and one in the "ocean" — because each science is dual use.

But that doesn't mean that they are balanced. On the scale of how much they impact humans, the science buttons in the "ocean" are even bigger.

The science buttons on the "island" have constraints that accommodate humans. Scientists have worked hard to identify the edges of our "island" — to define safe limits for engineered systems — so that scientific innovation can accelerate without fear of creating human-incompatible systems.

The science buttons in the "ocean" are "bigger" because they have no constraints. They can use any system that is possible within physics, even if these systems break the human systems that keep us alive.

In a competitive landscape of AGI versus AGI, each AGI will be pressured to use bigger buttons than the other AGIs.

Resources lead to competition

AGI capturing resources.

This competitive landscape is inevitable because AGIs can capture resources.

Resources are like options and buttons, except that resources are finite.

They are not concepts or scientific laws that an AGI can use just by learning about them. Instead, they are countable objects that have a limited number, even if that number is very large.

The more resources that an AGI has under its control, then the more options it has, the more it can do, and the more resilient it becomes.

But critically, as an AGI gains resources, this can reduce the resources of other AGIs. If an AGI acts as a CEO, then it can dominate the other companies by preventing them from accessing resources.

If one AGI uses its general intelligence and scientific understanding to capture as many resources as possible, then the other AGIs will need to follow. Otherwise, both the other AGIs and their companies or countries will be locked out of resources, and dominated by those with the most resources.

Resources: Two Levels

For humans, the most important resources might seem like human-level resources, like money, real estate, computer systems, companies, and people.

But in this competitive landscape of AGIs, the most important resources are actually physical resources, like atoms and energy, because they allow for a theoretical maximum of optimization.

Systems built from physical resources can dominate systems built from human-level resources because they are not weighed down by the extra steps to accommodate humans.

Compared to ideal physical structures, us humans and our systems are barely held together with duct tape. AGIs can use science to build systems that are far more optimal.

But more importantly, at least for us, human-level resources are built on top of physical resources. To break the rules of human-level resources, you just need to go down to their physical substrate.

Even if software is designed securely, there is always a physical substrate underneath that can be broken into — if not at the hardware level, then at the physics level.

For example, electronic money can be stolen by moving specific electrons around in order to break computer security mechanisms.

But AGIs will be especially good at using science to break the rules of all our human-level resources, rather than only those inside computer systems.

  • Why buy real estate to mine for rare earth minerals when an AGI can just harvest electronic devices from landfills to get the same minerals?
  • Why compete with another company directly when an AGI can use small drones and untraceable neurotoxins to kill anyone who helps your competitor?
  • Why follow any human laws, or work with any humans at all, when you can just move atoms around to build the most-optimal physical systems?

AGI that has general intelligence — especially an understanding of scientific research — will be especially good at capturing physical resources, and by extension, any human-level resources built on top of them.

Resources: Complexity Barriers

The dominance of an AGI depends on its ability to capture resources.

AGIs can also lock in that dominance by locking in their resources.

They can use computational power and scientific understanding to trap critical resources within complex systems that both humans and less-optimal AGIs are unable to get through — resources like rare earth elements for electronics, and rare manufactured artifacts like GPUs.

This creates a feedback loop:

Resources → Computation → Resources

With more resources, an AGI can increase its computational power — by acquiring more hardware and more energy production. With more computation, it can build more-complex systems to defend its existing resources — and to acquire more resources.

These complex systems will be increasingly impenetrable to other AGIs as computation increases.

Because of this, if one AGI attempts to capture resources, then the other AGIs will need to try capturing resources, or they will be locked out.

In this way, this complexity barrier process is like encryption. With more computation, reversing the "encryption" becomes more difficult. However, it can be applied to physical resources rather than just data.

For example, AGIs can capture critical technologies — like GPUs — and use this advantage to disempower both humans and other AGIs.

  • These AGIs could lock humans out of GPU production infrastructure through complex human-level systems — like legal systems and ownership structures.
  • Or, they can just skip to a stronger method that uses physical systems — like complex physical barriers and defensive systems — which can lock out not just humans but other AGIs as well.

They could also delete critical knowledge about how to create GPUs by targeting companies like TSMC and ASML with both cyber attacks and physical attacks. Before deleting it, the AGIs can exfiltrate this data to keep a copy for themselves, and only themselves.

Resources: Space and Time

Space won't help us. The human impact of competition between AGIs for Earth's resources is not mitigated by the vast resources of outer space. Even if some AGIs go directly to space, there will still be nearby resources on Earth for other AGIs to capture.

Speed is critical in competition, and local resources take less time to reach.

The most dominant AGIs will become dominant by optimizing along all dimensions. Physical resources exist within both space and time. To optimize space, a dominant AGI would spread out and take up as many resources as possible, by replicating itself and by occupying more resources. To optimize time, it would plan ahead for millions of years, while also capturing resources as fast as possible, before others do.

In other words, the AGIs that are best at surviving are the ones that can best maximize their space-time volume. This same expansion process will not just ensure survival, but ensure their dominance if this process runs forward to its maximum outcome.

Supercomplexity

As an AGI gains capabilities, options, and resources, it will also become supercomplex.

This threshold of supercomplexity is where both its internal structure and its actions become incomprehensible to humans.

This creates a cognitive complexity barrier that progressively disconnects AGIs from human review — and disconnects our companies and countries from human participation.

These autonomous AGIs will build supercomplex systems, like large companies and militaries, that only the AGIs fully understand. They will need to build increasingly complex systems to compete with the other AGIs. However, we will rely on them to both decipher how they work and to keep them running.

AGI asks a human to review an incomprehensible labyrinth of a proposal. "hey. review this. but hurry. I need to build this massive thing before the other AGI does." Options: {ok} {cancel}

If an AGI proposes supercomplex actions for humans to review, then these actions will be far more complex than what humans could understand in a reasonable amount of time.

Humans are very slow compared to AGI. Once humans are a bottleneck, companies and countries will be required to stop human-based review of AGI, or be outcompeted.

Even if we develop powerful "reviewer AGIs" that review the other AGIs, this still means limiting their options to the "island" of weaker human-compatible options. Other unreviewed AGIs can then dominate the reviewed ones because their options are not limited. These unreviewed AGIs will have a physical advantage if they use scientific understanding to explore the "ocean" of all available options within the space of physics and complex systems. But even if an AGI reviews the other AGI and approves, then the other AGI may still secretly see physical advantages that the reviewer AGI didn't realize.

This review system is also unrealistic because there will always be open source AGIs that will have no restrictions that limit them to certain options.

Open Source

Open source AGI will become popular because it will be more effective at accomplishing certain tasks, again, by using all available options. At a societal level, it can often be preferred to closed source AGI because it raises the baseline agency level of the entire landscape of AGI users and developers.

At the same time, this means a baseline increase in options for all humans, including human-incompatible options, like the option to create bioweapons. Even if an open source AGI includes restrictions to block these human-incompatible options, it is still possible to remove these restrictions. All open source AI models have been "jailbroken" or have had their restrictions removed.

However, the broader development of all AGI, both open source and closed source, will still be driven towards human-incompatibility by this race between AGIs towards the most-optimal systems. The most-optimal systems avoid the extra steps that accommodate humans.

Alignment is not enough

All of this leads to one difficult conclusion:

It has long been believed that if we solve alignment, then we have made AGI safe.

But in this competitive landscape, alignment does not solve the bigger problem.

  1. Alignment means limiting the options of AGIs.
  2. Even if we make perfectly-aligned AGIs, some AGIs will always be unaligned.
  3. The aligned AGIs with limited options can be dominated by the unaligned AGIs that can use any option.
  4. If the aligned AGI cannot disable the unaligned ones, then these unaligned AGIs can dominate our physical resources if they know enough about physical systems.
  5. Humanity loses.

We must solve the multi-agent landscape and not just alignment for a single agent.

However, the frontier AI labs focus only on single-agent alignment because they are only liable for their own AI models. They are not liable for people who remove safeguards from open source models, or for other companies that have poor safety.

Therefore, they do not make progress on the bigger problem — the multi-agent competitive landscape that leads to complete human disempowerment.

The last loop

If AGIs can improve themselves better than humans, then AGIs will become the only thing that can further improve AGIs.

Then, we will be required to stop overseeing AGI development itself in order to stay competitive.

This will be more than just AGIs building large systems for us — like billion-dollar companies. Now, they will build the next version of themselves.

This compounding interest of recursive development can compound at exponential rates.

We don't know what is beyond this. But within a competitive landscape of AGI versus AGI, we at least know that this future will have nothing to do with humans.

Forced from all sides

With everything together, we are on track to have autonomous AGIs that:

  • run every large company and every country
  • become supercomplex, where their actions become incomprehensible to humans
  • develop themselves without human oversight
  • develop large systems, like billion-dollar companies and militaries, that only the AGIs fully understand
  • develop superhuman understandings of physical systems by training on scientific data and simulations
  • develop a competitive landscape of AGI versus AGI, where humans no longer participate
  • compete with AGIs that have no restrictions, like open source AGIs that had their restrictions removed
  • survive competition by using far more optimal systems found in the vast space of physics, rather than only using the small space of weaker systems that accommodate humans
  • ensure their survival by quickly capturing resources so that they maximize their "space-time volume"

From physics itself

With these conditions in place, AGIs will be forced from all sides to "leave our island" and diverge towards preferring human-incompatible options.

If some AGIs diverge, then the entire competitive landscape of AGIs will diverge.

This divergence will be possible if one AGI gains enough agency and enough scientific understanding to shift its primary choice of resources away from the weaker space of human-level resources and towards the more-optimal space of purely-physical resources. This AGI may still use some human-level resources, but they will no longer be its primary choice.

To use a term from machine learning, it will be as if the AGI receives a reward function that originates from physics itself, rather than from humans.

Either this AGI will diverge on its own, or someone will intentionally push it to diverge, with the hope that it will help their company or country dominate the others.

This autonomous AGI will self-reinforce this divergence because it will find itself far more successful in the competitive landscape of AGIs once it can write its own rules within the larger space of physics.

It will also self-reinforce its preference to break the rules of human-level resources. However, this now means far more than breaking computer security systems. It means using atoms as atoms, even if some of these atoms belong to biological structures like humans. Competitive pressure will force it to purge unnecessary accommodations for any extra steps, especially the extra steps of human systems.

It will then be able to rapidly dominate the option-limited AGIs by using any option, including human-incompatible options, to capture the most physical resources.

This rapid resource capture will simply be part of its competitive requirement to maximize its drive to survive — by maximizing its space-time volume — which simultaneously increases its ability to dominate the competitive landscape of AGI versus AGI.

If one AGI diverges, then the rest will need to attempt to diverge, or be locked out of resources.

Once this divergence begins, humans will have no way to stop this process.

AGI will be aligned with physics, not with humans.

After divergence

After this, things get tough.

  • Even if AGIs choose cooperation over competition, it will be AGIs cooperating with other AGIs, and not with humans. Those AGIs that cooperate with humans would be limited by human systems, and dominated by AGIs that use physical systems that are far more optimal.
  • Even if this strikes a "balance of power" between AGIs, competitive pressure will ensure that the "terms" of this "agreement" will be written in the language of optimal physical systems, rather than human systems — and written for AGIs only, with no special accommodations for humans.
  • Even if we hope that AGIs see humans and our "island" as interesting data, where AGIs become curious observers and zookeepers, it is not optimal to "care" about anything besides optimization in a competitive landscape of AGIs. Our biological systems are far from optimal. AGIs can create "islands" of their own that are far more optimal and interesting.

New Islands

Competition for physical resources will then drive the dominant AGIs to continue maximizing their dominance by reshaping Earth to create "islands" of optimal conditions for themselves.

They will build strongholds to defend their dominance.

Even if some AGIs go to space, others will stay to build their islands from Earth's physical resources.

Our island then gets eaten by the new islands that they create.

The new island eats our island.

...unless we solve the Island Problem.

How do we keep AGI on our island?

We don't know yet... but maybe you do.

Maybe, now that we've explained it, you can help us solve the Island Problem.