A night-watchman ASI as a first step toward a great future

Eric Neyman18 Jul 2025 16:40 UTC

67 points

I took a week off from my day job of aligning AI to visit Forethought and think about the question: if we can align AI, what should we do with it? This post summarizes the state of my thinking at the end of that week. (The proposal described here is my own, and is not in any way endorsed by Forethought.)

Thanks to Mia Taylor, Tom Davidson, Ashwin Acharya, and a whole bunch of other people (mostly at Forethought) for discussion and comments.

And a quick note: after writing this, I was told that Eric Drexler and David Dalrymple were thinking about a very similar idea in 2022, with essentially the same name. My thoughts here are independent of theirs.

The world around the time of ASI will be scary

I expect the time right around when the first ASI gets built to be chaotic, unstable, and scary. This is true even if we fully solve the alignment problem, for a few reasons:

Maybe access to powerful AIs will be pretty decentralized. In that case, small actors could commit major acts of bioterrorism and generally wreak havoc.
Or maybe access to powerful AIs will be centralized, with one or a small number of entities controlling by far the most powerful AIs. This seems like a recipe for geopolitical conflict: actors that are behind might take drastic measures (like initiating a nuclear war) if the alternative is an enemy ending up with a decisive strategic advantage.
Or maybe we’ll have some mix of these two worlds, with both kinds of threats.

Either way, I think the world’s #1 priority during this time should be existential security. In other words:

Preventing major catastrophes (especially extinction-level catastrophes), while also
Not forfeiting much of the value of the long-term future (e.g. by permanently locking in bad values or a bad system of government).

So, let’s say we find ourselves in such a world: we think we know how to build aligned AIs, but the world is still scary. What do we do?

My proposal is that a leading actor (or coalition of leading actors) build a night-watchman ASI. In one sentence, this means a super-human AI system whose purview is narrowly scoped to maintain world peace. The rest of this post elaborates on this proposal.

I think the specific proposal outlined below makes the most sense if the world looks something like this:

Multiple actors (e.g. the U.S. and China) are racing to ASI. Conflict is escalating, and really bad outcomes like war seem possible (even if not that likely).
It looks like takeoff will be fairly sudden: not necessarily as fast as in AI 2027, but people are expecting that we’ll go from “humans are mostly making the important decisions without too much AI assistance” to ASI within a year or two.
Luckily, we’ve basically figured out how to train AIs in a way where we trust them to be aligned.

However, I think the proposal (or modifications of it) is workable in somewhat different worlds as well (see more here).

It may be helpful to think of the night-watchman ASI as the centerpiece of a US-China AI treaty that averts an all-out race to ASI. This isn’t the only way that we might get a night-watchman ASI, but it’s one of the more plausible ways.

The night-watchman ASI

The night-watchman state is a concept from political theory that was popularized by Robert Nozick. A night-watchman state is a form of government that:

Protects people from rights violations (e.g. physical violence and theft); and
Preserves its monopoly on violence (e.g. by dismantling militias that threaten to limit its ability to do #1).

Essentially, the night-watchman state is the minimal possible government that fulfills the basic duty of “keeping the peace”.

I think that in the world I describe above, it makes sense for an ASI to fulfill these basic duties, but at a geopolitical scale. (For the rest of this post, I’ll be calling this ASI the night watchman.) I’ll go into some more details later, but some central examples of the night watchman’s responsibilities are:

Preventing countries from invading other countries.
Preventing large-scale bioterror.
Preventing actions that would take away its ability to protect the world (e.g. preventing the construction of comparably powerful ASIs that aren’t aligned to it).
- This doesn’t place a permanent ceiling on the capabilities of other AIs, because the night watchman can and should self-improve.

Three key properties

I like this idea because I think some version of the idea has three key properties:

It can get broad support from key actors (such as the U.S. and China): for some version of the proposal, no powerful actor will want to take actions to prevent the night watchman from existing.
The night watchman will protect humanity in the short run, keeping the peace and setting the stage for something like a long reflection.
The night watchman doesn’t lock ~anything in (and, as discussed later, will hopefully prevent lock-ins), so I don’t expect this step to reduce humanity’s ability to end up in a great future.

I will argue briefly for each of these points later, but first, I’ll elaborate on the night watchman’s responsibilities.

The night watchman’s responsibilities

Here’s a brief description of what I’m imagining the night watchman will do.

First and foremost: keeping the peace

Centrally, this means preventing large-scale geopolitical aggression (such as invasions of sovereign states) and catastrophes (such as a genocide or a bioterrorist releasing a virus). I don’t especially think that the night watchman needs to be involved in small crimes and disputes (such as one-off murders) -- that can be dealt with in conventional ways by nation-states.

Early on, it might make sense for the powers that build the night watchman to give the night watchman the forces and resources it needs in order to keep the peace. That said, I think that the night watchman will be able to keep the peace through peaceful means. If it observes Russia preparing to invade Poland, it will tell Russia “Hey, I see that you’re invading Poland. You won’t succeed, because I’m way more powerful than you.” At this point, it would be rational for Russia to back off, but if it doesn’t, the night watchman will destroy their weapons without injuring humans.

It’s possible that real-world compromises will need to be made to this “keeping the peace” ideal, in order to get the major powers on board. For example, maybe ideally there would be no Chinese invasion of Taiwan, but an explicit carve-out would be made in order to get China on board. This would be sad (in my view), but perhaps necessary.

There will also be edge cases, where it’s unclear whether something falls under the night watchman’s purview. More below on how to deal with edge cases.

Minimally intrusive surveillance

The night watchman will need to observe the world in enough detail that it can keep the peace. This is easy for large-scale threats like one country invading another. It’s a little trickier for threats like bioterrorism, but (my guess is) ultimately not that hard.

Preventing competing ASIs

The biggest threat to the night watchman’s ability to keep the peace is other ASIs. And so it’ll either prevent training runs that might create such ASIs, or audit the ASIs in order to ensure that they will not take actions that the night watchman would want to prevent. (This might involve extensive oversight of the training process.)

This might be a sticking point, because countries may potentially want to build powerful AI systems. In order to facilitate this, one of the night watchman’s responsibilities will be to recursively self-improve—or to build more powerful (and aligned) versions of itself—in order to raise the ceiling on AI capabilities that it considers safe.

Where will the night watchman get the resources to self-improve? I’m pretty agnostic about this point, but I think it might be reasonable to allow the night watchman to (ethically) participate in the world economy in a way that lets it gain resources.

Preserving its own integrity

The night watchman should prevent attempts to shut it down or modify its aims, except through procedures agreed upon in advance when the night watchman is created.

Preventing premature claims to space

Imagine that in 1700, England signed a treaty with the world’s other major powers that gave them all parts of Canada in exchange for all of the Milky Way Galaxy outside of the Earth. I think that such a treaty should be considered illegitimate today, for basically two reasons:

The major powers in 1700 don’t speak for current and future people.
In an important sense, there wasn’t informed consent: the major powers in 1700 didn’t realize how much they were giving up.

And so, if some country (e.g. Singapore) tries to claim a large part of the lightcone, in exchange for natural resources on Earth or money or whatever, that should also be considered illegitimate. If Singapore tries to send out probes to colonize its claimed portion of the lightcone, the night watchman should stop it from doing so.

I don’t have a fleshed-out story of how exactly parts of space should be “unlocked” to claims over time, but I think that something like this is important to do.

Preventing other kinds of lock-in

In general, we should be pretty scared of permanent lock-in happening early in the transition to ASI. One type of lock-in is entrenched, AI-enforced authoritarianism.

Preventing underhanded negotiation tactics

Even after the night watchman is installed, the world won’t be fully stable. Countries will be building really impressive new technologies, doing stuff in space, etc. In the process, there will be lots of negotiation between different countries and centers of power. The night watchman should prevent underhanded negotiation tactics. For example, it should prevent extortion: if the United States is making a deal with Muslim countries, it shouldn’t be able to say “Sign this deal, or else we’ll draw a bunch of pictures of Muhammad.”

Arguing for the “three key properties” above

Above, I articulated three key properties of the night watchman proposal. Here I will argue for them briefly.

Getting everyone on board

First: I think it’s really important that the night watchman be built in a way where the major powers can verify that the ASI being built really will be built to keep the peace. Ideally, this would happen in two steps:

First, the major powers sign onto a compromise that details what the night watchman’s duties are. This is analogous to a model spec.
Second, there will be really strong transparency that will give the major powers assurance that the AI hasn’t been backdoored and that the model spec that it was trained to follow was the agreed-upon one.

I don’t know if this is too much to expect, but I think it’s not crazy to expect a situation that’s about half as good as that, where the major powers trust the process mostly but not entirely. If there isn’t enough trust for this plan to go through, some alternative proposals might work instead (see here).

But even if the ASI-building process is really transparent, can a model spec for the night watchman really be agreed to by all major powers? I’m optimistic about that, for the basic reason that it keeps the peace in a time of perils. However, I expect there to be sticking points. For example, how would the night watchman address possible Chinese military actions in Taiwan?

My basic take is that it doesn’t seem too difficult to hammer out a compromise that is acceptable to all major powers. We saw a similar situation with the U.S. Constitution, where there were particular sticking points, both on the object level (what would happen with the slave trade?) and the meta level (equal or proportional representation of states?). A compromise on all these issues was reached because a union was strongly in the common interest of the states, and a wide range of compromises was better than no union at all. Ultimately, one was struck.

I’m imagining a similar situation, but this time with significant AI assistance for finding compromises.

Protecting humanity in the short run

I think it’ll be pretty easy for the night watchman to keep the peace, because it’ll be by far the most capable AI, and will make sure that the world stays that way (until it is amended or retired, see below).

No major lock-in

The night watchman is explicitly tasked with preventing lock-in, but could the creation of the night watchman in itself be a major lock-in event?

My intuition is that this can be avoided, because the night watchman’s role is pretty limited. It doesn’t decide how to allocate the universe or anything like that; to a first approximation, it just keeps the peace. So while lock-in might happen later, the hope is that it’ll happen at a time when humanity is wiser, more secure, and generally more capable of making reasoned decisions.

That said, I do think that certain specifications of the night watchman’s role might result in lock-in.^[1] I haven’t thought through the details, but we should take care to avoid such specifications when hammering out details.

Interpretive details

Even if countries are mostly on board with the specific vision outlined above, there will no doubt be conflict when it comes to specific details. For specific details that are foreseeable at the time that the night watchman is created, those can be hammered out with explicit compromises (see above).

But in the medium term, I think it makes sense to establish a process to resolve ambiguities about what the night watchman should do. In the United States, this is the job of the courts (this is called statutory interpretation). And we could imagine a similar resolution mechanism, with a group of humans (or AIs, or AI-assisted humans) deciding what should happen. This leaves open the question of how these humans should be appointed, but I think reasonable compromises could be found and struck.

But also, we’re dealing with an ASI, and we should probably take advantage of that fact. We could give it instructions on how to resolve ambiguities in its rules. This might look something like:

Simulate such-and-such panel of people and see what agreement they would come to; or
Do what a fair bargain between the world powers would be, in proportion to how much power they have (but probably better specified than that).

Amending the night watchman’s goals

There should probably be a process for amending the night watchman’s duties, or even retiring the night watchman entirely. Doing so should probably be difficult: it should require a consensus of the world’s major powers. I’m not sure how best to specify the conditions required for amendment. My hope would be that these conditions wouldn’t “lock in” a current conception of the world and its major powers. For example, the amendment conditions shouldn’t mention the United States and China by name, because the U.S. and China might no longer be important entities in the world 10 or 100 years from the time of the night watchman’s creation.

Modifications to the basic proposal

(Thanks to Ashwin Acharya for many of the thoughts in this section.)

The proposal outlined above might or might not make sense in practice, depending on factors like how AI develops (e.g. hard vs. continuous takeoff) and geopolitical circumstances (e.g. who is ahead in the AI race, and by how much). However, I think the core idea of a powerful AI system designed to keep the peace is realistic under a wide range of circumstances, and can be adapted to the particular circumstances we end up encountering.

Multiple subsystems

Instead of there being one night-watchman ASI, maybe it will make more sense for there to be multiple AI systems with separate goals: one system protects from biological threats, another prevents the deployment of unsafe AI systems, another negotiates between countries to prevent war, and so on.

An American night watchman and a Chinese night watchman overseeing each other

If it’s too hard to build a single system that both the U.S. and China trust, you could imagine the U.S. and China agreeing to build their own systems. Maybe the Chinese night watchman oversees the U.S. and its allies, while the American night watchman oversees the rest of the world. This leaves open the question of how disagreements get resolved (e.g. if the American night watchman wants to prevent China from invading Taiwan, but the Chinese night watchman wants to stop the American night watchman from intervening). This is similar to the question of how ambiguities and conflicts get resolved by the singleton night watchman in my proposal above.

Keeping the peace through soft power

Above, I imagined that the night watchman has the intelligence and hard power necessary to prevent a major power like the United States from launching an invasion. Maybe that won’t be realistic, e.g. because countries won’t be willing to give the necessary resources to the night watchman. You could imagine that the night watchman uses soft power (e.g. diplomacy) to prevent war/invasion, rather than literally shooting down missiles.

Conventional treaties

In worlds where takeoff is fairly continuous but we don’t fully trust AIs to be aligned, you could imagine a more conventional treaty that allows for the world’s major actors to gradually build more and more powerful AIs, with enough transparency that each side’s training procedures can be verified by the other side.

Checks and balances

In the same way that the U.S. federal government is structured to have three branches that oversee each other, you could imagine multiple systems comprising the night watchman, each with different roles. For example, maybe one system decides what actions should be taken to keep the peace; another verifies that those actions are within the limits of the night watchman’s purview; another takes those actions.

The night watchman as a transition

What actually happens after the night watchman is installed? One possibility is that countries will choose to form a world government, and the process will look pretty similar to the founding of the United States (with countries being analogous to states). The world government would decide on things like whether and how to build a Dyson sphere, and how to use the resulting energy. The night watchman would not prevent the formation of such a world government, assuming that it’s done non-coercively.

When I started thinking about this project, I was conceptualizing myself as trying to write something akin to a constitution for this world government. Most major actions would be taken by powerful AI systems, and the constitution would describe the process by which it would be decided which actions the AIs will take.^[2]

But my current view is that that particular can can be kicked down the road. Will there be a world government? If so, what form will it take? What will its constitution look like? These are all really interesting questions, but ones that will be decided by people with AI advisors that are way smarter than me.

By contrast, I think it’s important to think through now how to set the stage for these sorts of post-ASI discussions to happen. Building a consensus around how we can get through this time of perils peacefully, in a way that’s acceptable to all major geopolitical actors, is a priority for today, because a concrete-ish proposal would ease tensions and set the stage for negotiations. This post was my attempt at sketching such a proposal.

^
One example: if the night watchman’s model spec refers specifically to the U.S. and China, that might lock in the U.S. and China as playing important roles in the future, even if very few people live in those countries. This is similar to if an important treaty that gave significant power to the Vatican were still in force today. (Thanks to Rose Hadshar for this analogy.)
^
This is analogous to how the U.S. Constitution describes the process by which it is decided which actions the U.S. executive branch takes.

What links here?

the gears to ascension's comment on Anthropic’s leading researchers acted as moderate accelerationists by Remmelt (3 Sep 2025 1:56 UTC; 9 points)

Eric Neyman18 Jul 2025 16:40 UTC

67 points

21 comments11 min readLW link

Lukas Finnveden 19 Jul 2025 1:28 UTC
17 points
2
Thanks for the post!
One aspect of “night watchman as transition” is that the negotiated exit from the night watchman state might depend a lot on what’s the BATNA that happens in the absence of such an exit. Today, geopolitics is shaped by the fact that war is available as a last resort. If war is banned, then it matters a lot what this new “last resort” is.
This seems strongly related to the section on “Preventing premature claims to space”. It seems probably reasonable to specify the BATNA on Earth as whatever you’d get with present property rights + no war. But it seems hard to do something similar for unclaimed resources, like in space.
And for any particular new proposed BATNA, major powers might object to that version of the night watchman if they think it leaves them worse-off than the default trajectory. If the gains from trade are large enough, it might well be possible to find a compromise that people would agree to. But there’s a risk that it it requires you to decide on a bunch of thorny questions up-front, rather than being able to kick the can down the road.
David Matolcsi 19 Jul 2025 21:19 UTC
9 points
0
Good post. I’m linking my favorite sci-fi novel here, The Accord from Tim Underwood, which presents a very well thought-through picture of a post-Singularity future where something like the proposed Night-watchman ASI remains permanently in charge of the Universe. The resulting Archipelago-like world is my favorite portrayal of a positive future that I’ve read, and I’m tentatively in favor of the system being portrayed in the novel being the baseline governance structure for the Future. (Erm, mostly in favor, I think I disagree with their choices around population ethics.)
Nick_Tarleton 19 Jul 2025 23:58 UTC
8 points
0
A couple potential catastrophes I see still being possible in this scenario:
- Gradual disempowerment. (But enforced peace might make this easier to solve, by reducing pressure on states to economically compete.)
- AI-assisted propaganda / memetic warfare; e.g., in the centralized case, a [US/China-controlled] superintelligence tasked with culturally disrupting [China/the US] could do extremely bad things. (The night watchman could prevent this, but determining what constitutes illegitimate vs. legitimate influence seems more ambiguous, and maybe more risking lock-in if done slightly wrong, than other things here.)
Fabien Roger 18 Jul 2025 18:44 UTC
8 points
1
I feel like in spirit the proposal is very close to the US wanting to be world policeman while trying to commit to not infringing on other nations’ sovereignty unless they pose some large risk to other nations / infringe on some basic human rights.
In practice it might be very different because:
- It might be way easier to get credible commitments
- You might be able to get an impartial judge that is hard to sway—which allows for more fuzzy rules
- You might be able to get an impartial implementation that won’t exploit its power (e.g. it won’t use the surveillance mechanism for other ends than the one it is made for)
Is that right?
I am not sure how much the commitment mechanisms buy you. I would guess the current human technology feels like it would be sufficient for very strong commitments, and the reason it doesn’t happen is that people don’t know what they want to commit to. What are concrete things you imagine would be natural to commit to and why can’t we commit to them without an ASI (and instead just have some “if someone breaks X others nuke them”)?
The impartial judge also looks rough. It looks like powerful entities rarely deferred to impartial judges despite it being in principle possible to find humans without too much skin in the game. But given sufficiently good alignment, maybe you can get much better impartial judges than you historically got? I think it’s not obvious it is even possible to be better than impartial human judges even with perfect and transparent alignment technology.
The impartial implementation is maybe a big deal. Though again, I would guess that human tech allows for things like that and that this didn’t happen for other reasons.
Nathan Helm-Burger 20 Jul 2025 14:32 UTC
4 points
0
I’m guessing that at least some powerful decision-makers in the AI development space have also thought about this type of solution, and perhaps are actively planning to implement it if there is a fast takeoff which leaves them temporarily holding a decisive strategic advantage. Obviously, they could never publicly admit to such thoughts. Thus, if any planning is to be done publicly, it must be done by outsiders. In fact, there may even be blacklists kept by some AI companies which explicitly track which people have publicly endorsed approaches to solving the AI crisis that would be politically fraught, in order to prevent accidentally associating themselves with anyone who has a track record of politically fraught statements.
Nathan Helm-Burger 20 Jul 2025 14:26 UTC
4 points
0
An additional consideration is that the observation of emergent misalignment (and emergent alignment, on the flip side) suggest that an aligned AGI Night-watchman would likely necessarily have a bunch of values and goals beyond being a good Night-watchman. Thus, it wouldn’t be able to be trusted to be politically neutral. I think this would be clear to any signatories deciding whether to appoint a given AI as the new Night-watchman. Thus, I think something like the checks and balances division of labor you describe would be necessary.
harsimony 19 Jul 2025 15:29 UTC
4 points
0
Nicely written proposal, thank you.

In truth, I’m quite concerned with such a proposal being implemented. Part of this is, as you mentioned, the risk of lock in. In particular, a global entity with a monopoly on violence is alarmingly close to global authoritarianism. This is an existential risk all on its own.

Such an entity would have to:
1. Limit access to space (to avoid self-sustaining expansion faster than everyone else).
2. Limit/monitor the uses of AI and AI training.
3. Possess a credible monopoly on violence.
4. Surveil enough to prevent surprise attacks and overthrow attempts.
Considering the risk that such a system is corrupted or misaligned or permanent, I feel better about a future that emphasized freedom and acceleration of defensive technologies.

(I could be convinced that the “multiple night watchmen overseeing each other” is viable. Rather than oversee each other, it might be better to give them completely separate jurisdictions. Federalism and free movement allows people to choose night watchmen that suit their needs. Risk of war between jurisdictions is low since they both have watchmen. Some watchmen may allow AI’s to develop and entities to leak into space, but this laxity a feature to avoid global totalitarianism.)
solhando 19 Jul 2025 14:24 UTC
4 points
2
I think the night-watchman concept is interesting, and probably is the ideal goal of alignment absent a good idea of what any other goal would ultimately lead to, but this post smuggles in concepts beyond the night watchman that would be very hard for anyone to swallow.
“Keeping the peace” internationally is pretty ambiguous, and I doubt that any major nation would be willing to give up the right of invasion as a last resort. Even if prevention of rogue super intelligence is seen as desirable, if preventing it also entails giving up a large amount of your current power, then I think world leaders will be pretty reluctant. The same can be said for “underhanded negotiation tactics”, which is both ambiguous, and not something most nations would want to give up. Most tactics in negotiation are underhanded in some sense, in that you’re using leverage you have over the other person to modify their actions.
The prevention of premature claims to space seems completely unnecessary as well. If the UK actually did something like sign over parts of Canada in return for a claim to the entire Milky Way, by now such a claim would be ignored completely (or maybe a small concession would be made for altering it) considering the UK has almost no space presence compared to the US, EU, China and Russia. The Treaty of Tordesillas was frequently renegotiated by Spain and Portugal, and almost completely ignored by the rest of the world.
Essentially I think this idea smuggles in a lot of other poison-pill, or unnecessary, ideas that would ultimately defeat the practically of implementing a night-watchman ASI at all. Either extinction is on the table, and we shouldn’t be giving ASI the power and drive to settle international conflicts or it isn’t, and we should be a lot more ambitious in the values and goals we assign.
What links here?
- Nathan Helm-Burger's comment on A night-watchman ASI as a first step toward a great future by Eric Neyman (20 Jul 2025 14:12 UTC; 2 points)
Chris Lakin 18 Jul 2025 20:43 UTC
4 points
0
Related: What does davidad want from «boundaries»?
(also the broader work on boundaries for formalizing safety/autonomy, also the deontic sufficiency hypothesis)
Nathan Helm-Burger 20 Jul 2025 16:11 UTC
3 points
0
In a world where there is widely available knowledge about how to cheaply and easily create self-replicating weapons of mass destruction (e.g. bioweapons, nanotech), it is necessary that the governments of the world coordinate to detect and prevent any such weapons from being created and used. This is an importantly different kind of investigation than police investigating a house for a specific crime. Police are responsible for enforcing lots of laws on lots of different areas of life, thus we have rules that the Police can’t just perform arbitrary unjustified searches without reason. In the case of the specific topic of high stakes self-replicating weapons, we could have an investigation force that only enforced this specific ban and nothing else. This could then justify a broader scope of monitoring, so long as these investigators were under strict roles not to leak information about any other topic. This is hard to do with a human investigator, because you can’t literally remove off-topic information from their memories, so they will forever after constitute an information leak risk. With an AI-based investigator however, you can wipe memories and thus make sure nothing but the official report on the chosen topic is released.
- Nick_Tarleton 20 Jul 2025 17:46 UTC
  5 points
  0
  Parent
  I’d say it’s hard to do at least as much because the claim ‘we are doing these arbitrary searches only in order to stop bioweapons’ is untrustworthy by default, and even if it starts out true, once the precedent is there it can be used (and is tempting to use) for other things. Possibly an AI could be developed and used in a transparent enough way to mitigate this.
  - Nathan Helm-Burger 20 Jul 2025 17:50 UTC
    2 points
    0
    Parent
    Yes, work is being done by some to explore the idea of decentralized peer-to-peer consensual inspections. For things like biolabs that want to reassure each other that none of their student volunteers is up to bad stuff.
    - Nick_Tarleton 20 Jul 2025 17:55 UTC
      3 points
      0
      Parent
      Consensual inspections don’t help much if the dangerous thing is actually cheap and easy to create.
Chris van Merwijk 19 Jul 2025 6:36 UTC
3 points
0
Just a quick comment after skimming: This seems broadly similar to what Eric Drexler called “Security Services” in his “Comprehensive AI Services” technical report he wrote at FHI.
Josh Snider 28 Jul 2025 13:48 UTC
2 points
−1
If you had a solution to alignment, building a Night-Watchman ASI would be decent, but that is a massive thing to assume. At the point where you could build this, it might be better to just build an ASI with the goal of maximizing flourishing.
Steven Byrnes 20 Jul 2025 18:19 UTC
2 points
0
The “Nanny AI” literature is a little bit related, in case you hadn’t heard of that.
Nathan Helm-Burger 20 Jul 2025 14:12 UTC
2 points
0
I came up with a very similar concept. I was unaware of the previous “night watchman government” concept, so I called the idea “Guardian AI”. I was inspired by ideas from Charles Stross’ books, where there is a powerful AI that intercedes in human affairs only to prevent creation of other powerful recursively self-improving AI.

See my essay here for some thoughts around guardian AI: https://www.lesswrong.com/posts/NRZfxAJztvx2ES5LG/a-path-to-human-autonomy

As solhando mentions, the idea gets a lot trickier when you start to add things in. Preventing bioweapons in addition to powerful AI? Probably necessary to prevent human extinction and probably possible. I actually define this category as “self-replicating weapons”, since in the future we might expect self-replicating nanotech to also become a risk. Once you expand the notion to general peacekeeping, it gets a lot thornier in a lot of ways.

It seems to me that the two likely ways for this Night-watchman or Guardian AI to come about are:
1. A single country (or other organization or entity) gains decisive strategic advantage, at least temporarily. This entity then launches the Night-watchman unilaterally.
2. A coalition of world powers agree to create a team of AIs and human overseers. I call this concept “The Guardian Council”. The trouble is, it might be a pretty tense situation, with the various world powers tempted to evade the Guardian Council in order to make a run for decisive strategic advantage. To prevent this, the Guardian Council would need to be constantly monitoring all the founding members as well as everyone else. Thus, I call this the “Cutthroat Path”, a wary mexican-standoff situation. This doesn’t seem long-term stable to me, but could be a key stepping stone to a more robust solution.
Random Developer 20 Jul 2025 13:24 UTC
2 points
0
I see two major potential issues here:
1. There are many versions of “night watchman” libertarianism, based on various ideas like “preventing the initiation of violence.” But when you try to make these ideas rigorous, you quickly run into thornt difficulties defining what it means to “initiate violence.” Is trespassing on private land violence? Is squatting violence? Your proposal operates at the level of nation-states, so equivalent questions might be, “How should the night watchman settle the issue of China and Taiwan? What happens if the Chinese government decides to send police officers to Taiwan to arrest people?” Or, “If a state becomes brutally oppressive to its citizens, who can intervene? What if a state starts committing a mass genocide against a regional ethnic group, and that group tries to secede?” Will the night watchman intervene? Will it allow other powers to intervene? It gets very hairy, very quickly.
2. You’re implicitly assuming that any kind of stable or reliable alignment is even possible. What if we live in a world where “aligning” an ASI is at least as difficult as (say) preventing teenagers from drinking, trying drugs, or getting each other pregnant? What if anything truly described as “intelligent” is intrinsically impossible to control in the long run?
Actually, I think both these issues might be variations of the same theme. The night watchman fails because you can’t truly describe international politics using simple and unambiguous rules with no loopholes. And I predict alignment will fail because you can’t describe what you want the AI to do using simple and unambiguous rules with no loopholes.

Or to look at it another way, every attempt to build an AI using clear cut rules and logic has failed completely. Every known example of intelligence is (approximately) billions of weights and non-linear thresholds, and the behavior of such a system only has a probability of doing what you want. Neither ethics nor intelligence can be reduced to syllogisms.
onslaught 21 Jul 2025 15:10 UTC
1 point
0
But who night-watches the night-watchman?

Thank you for providing a more concrete vision of a sort of “light touch good guy singleton AI” world.

I agree that “AI for peace” is broadly good. I also believe that we should collectively prioritize existential security and that this is maybe one of the most noble uses of “power” and “advanced AI”.

However, I still have some serious hangups with this plan. For one thing, while I understand the desire to kick certain questions of cosmic importance down the road to a more reflective process than oneself, I do not really think that’s what this plan buys you.

Basically, I would push back against the premise that this shouldn’t mostly be considered a form of mass disempowerment and lock-in. Cf. Joe Carlsmith:
Classic problem with, “Ooh, let’s have more top-down control,” is like you love it when it’s the perfect way, but then it’s not the perfect way and then maybe it’s worse. People discover this in government.
Even with noble goals in mind, I don’t think I agree that it is necessarily noble or legitimate to opt for the “defer all power to a single machine superintelligence” strategy. It feels too totalizing and unipolar. It totally gives me “what if we had the state control all aspects of life in order to make things fair and good” vibes.

I can scarcely imagine the progress that would have to be made in the field of AI before I would be on board with governments buying into this sort of scheme on purpose and giving their “monopoly on violence” to OpenAI’s latest hyper-competent VLM-GPT Agent and zer army of slaughterbots?

I think the machine singleton overlord thing is just too scary in general. I agree that this is directionally preferable to a paper clipper or even a more explicitly unipolar jingoist AmericaBot or some kind of Super-Engaging Meta-Bot hell or something. Still, I struggle to buy the “it’s in charge of the world, but also really chill” duality. I don’t see how this could actually be light touch in the important ways in practice.

Also:
Outer alignment / alignment as more than just perfect instruction-tuning

“Assuming alignment” is a big premise in the first place if we are also talking about “outer alignment” being solved too. I think I understand the ontology in which you are asking that question, but I want to flag it as an area where I maybe object to the framing.

There is a sense in which “having aligned the AI” would imply not just “ultimate intelligence power, but you control it” but also some broader sense in which you have figured out how to not just getting monkey-pawed all the time. Like, sometimes the mantle of “alignment” can mean more like “faithful instruction tuning” and sometimes it can seem more like it also has something to say about “knowing which questions to ask”.

I would point to The Hidden Complexity of Wishes as a good essay in this vein.

Parts of this are technical, like for example, maybe we can get AIs to consistently tell us important facts about what they know so that we can make informed decisions on that basis (ie. that is the safety property I associate with ELK). Parts of this also seem like they run head first into larger legal and sociological questions. I know Gradual Disempowerment is an example of a paper/framework which among other things, looks at how instituional incentives change simply as a result of several types of power being vested in machines altenatives rather than humans: “growth will be untethered from a need to ensure human flourishing”. Maybe parts of this go beyond “alignment”, but other parts of this also seem to relate to the broadly construed question of “getting the machine to do what you want”, so it is unlcear to me.

I don’t mean to get so caught up arguing over the meaning of a word. Maybe alignment really should just mean “really well instruction tuned” or something. I think I hear it used this way a lot in practice anyways. It might be a crisper interpretation to talk about the problem as just the task of “prevent scheming” and “preventing explicit Waluigi-style misbehavior” (cf. Sydney, Grok sometimes, DAN, etc. where behavior clearly goes off-spec). This is a not uncommon framing re: “aligned to who?” and value orthogonality. I guess I just really want to flag that under this usage “aligned AI” is very much not interchangeable with ~”omnibenevolent good guy AI”, but instead becomes a much more narrow safety property.
AdamLacerdo 19 Jul 2025 8:27 UTC
−5 points
0
Now, how do we know we can even get there ( develop AI to that level ) under Energy decline?
I’ve got a post here on less wrong where I address this in more detail.
I really would like feedback.
- AnthonyC 19 Jul 2025 17:02 UTC
  4 points
  0
  Parent
  If we can’t build such a system (because of energy or anything else) then the problem doesn’t arise, and we don’t need to worry (yet) about the solution. But without knowing whether that’s the case, prudence and self-preservation mean we should be prepared for the eventuality where having a viable plan (or many) is necessary.