A friend in technical AI Safety shared a list of cruxes for their next career step.

A sub-crux for non-stoppable AI progress was that they could not see coordination happen without a world government and mass surveillance.

I asked:

Are the next beliefs and courses of action in your list heavily reliant upon this first premise/belief? ie. If you gradually found out that there are effective methods for groups around the world to govern themselves to not build dangerous auto-scaling/catalysing technology (such as AGI) where those methods do not rely upon centralised world governance or mass surveillance, would that change your mind as to what you / the AIS community needs to focus efforts on?

Copy-pasting a list I then wrote in response (with light edits):

Why coordinating to align as humans to not develop AGI is a lot easier than, well coordinating as humans with AGI coordinating to be aligned with humans
Adding to principal-agent problem complexity:
AI-alignment is roughly an instance of the principal-agent problem (principal – all of many humans; agent – AI).
Human-human conflict, human-human arms races, or humans selling or releasing tech that harms other humans are also instances of principal agent problems.
How are you going to solve all that distributed complexity – all those principal-agent problems expressing themselves in many different ways for many different humans living in many different circumstances – with one general solution?
You cannot just say – ah but AGI has godly powers and can/will solve all of that for us (as in ‘Artificial Godly Intelligence’). To me that does not clarify how adding yet another but much more powerful and much more alien agent in the mix is going to help solve all those other principal-agent problems in the mix. ‘Beneficial AGI that takes over the world’ raises many more new questions to me than it answers.
Coordinating with aliens:
“Well that’s just not going to happen, the capitalist forces driving R&D cannot be overcome, so the best we can do is work tirelessly for the small but nonzero chance that we can successfully align AGI.”
^– A researcher wrote me that they kept getting this reaction when arguing that AGI safety might be impossible.
I also got this reaction a few times.
It makes me wonder how these people mentally represent “AGI” to be like.
Some common representational assumptions in the community seem to be that “AGI” would be (a) coherently goal-directed unitary agent(s) that optimise for specific outcomes across/within the outside messy world and that avoid(s) causing any destabilising side-effects along the way.
I personally have a different take on this:
Their claim implies for me that for us humans to coordinate – as individual living bodies who can relate deeply based on their shared evolutionary history and needs for existence (ie. as *interdependent* in our social actions), is going to be harder than to ‘build’ perpetual alignment into machines that are completely alien to us and that over time can self-modify and connect up hardware any way they’re driven to.
Here you bring this artificial form into existence that literally needs netherworldly conditions to continue to exist (survive) and expand its capacity (grow/reproduce).
Where given the standardisation of its hardware/substrate configurations, it does not face the self-modification or information-bandwidth constraints that we humans, separated by our non-standardised wetware bodies (containing soups of carbon-centered-molecules), face. This artificial form will act (both by greedy human design for automation and by instrumental convergence) to produce and preserve more (efficient) hardware infrastructure for optimising selected goals (ie. the directed functionality previously selected for by engineers and their optimisation methods, within the environmental contexts the AI was actually trained and deployed in).
These hardware-code internals will have degrees of freedom of interaction for/with changing conditions of the outside world that are for sure over the long term going to feed back into a subset of those existing internal configurations through their previous outside interactions replicating more frequently and sustainably through existing and newly produced hardware. Eg. through newly learned code variants co-opting functionality, inherent noise interference on the directivity/intensity of transmitted energy signals, side-branching effects on adjacent conditions of the environment, distribution shifts in the probabilities of possible inputs received from the environment, non-linear amplification of resulting outside effects through iterative feedback cycles, etc.
The resulting artificial forms (whether you depict them as multiple AGIs, or a population of AGI parts) will at various levels exchange resources with each other, and be selected by market forces for their capacities to produce similar to how humans are. And unlike humans, they are not actually separable as individual-band-width-constrained agents.
Some AIS researchers bring up the unilateralist curse here – that a few of the many human market/institutional actors out there would independently act unsafely out of line with the (implicit) consensus by developing unsafe AGI anyway. But if we are going to worry about humans’ failure to coordinate, then getting all such conflicted humans to coordinate to externally enforce coordination on distributed self-learning hardware+code components – that in aggregate exhibit general capabilities but have no definable stable unit boundaries – is definitely out of the question.

List #2: Why coordinating to align as humans to not develop AGI is a lot easier than, well… coordinating as humans with AGI coordinating to be aligned with humans

Remmelt24 Dec 2022 9:53 UTC

1 point

0 comments3 min readLW link

AI Governance Slowing Down AI

Crossposted to EA Forum (3 points, 0 comments)

No comments.