Generators Of Disagreement With AI Alignment

Link post

I often find myself disagreeing with most of the things I read about AI alignment. The closest I probably get to accepting a Berkely-rationalism or Bostrom-inspired take on AI is something like Nintil’s essay on the subject. But even that, to me, seems rather extreme, and I think most people that treat AI alignment as a job would view it as too unconcerned a take on the subject.

This might boil down to a reasoning error on my end, but:

I know a lot of people that seem unconcerned about the subject. Including people working in ML with an understanding of the field much better than mine, and people with an ability to reason conceptually much better than mine, and people at the intersection of those two groups. Including some of my favorite authors and researchers.

And, I know a lot of people that seem scared to death about the subject. Including people working in ML with an understanding of the field much better than mine, and people with an ability to reason conceptually much better than mine, and people at the intersection of those two groups. Including some of my favorite authors and researchers.

So I came to think that there might be some generators of disagreement around the subject that are a bit more fundamental than simple engineering questions about efficiency and scaling. After reading nintil’s (linked above) and VKRs most recent essays on the subject, I think I can finally formulate what those might be.

I—Accuracy Of Quantifiable Information

A very good way to classify information is whether or not that information is quantifiable and currently being quantified.

Yearly statistics about crime rates in Brabant, Brussels is a quantifiable piece of information. The “vibe” I get when walking through Brabant, Brussels, is not a quantifiable piece of information, it’s something internal to me, I may use it to derive the same action or world-model update I would from crime statistics (e.g. this place seems unsafe, I’ll stay away), but it is not the same thing.

But, you say, you could describe the “vibe” you get from walking around there. And I agree, but I couldn’t do this perfectly, I could do it better than most, and some could do it better than me, and it’s hard for me to quantify how meaningful that description is in any way.

But, someone else says, the crime statistics don’t reflect reality, under-reporting, fake reports, and the fuzzy details that get shoved aside to fit it into a neat category lead to information loss.

The accuracy and type of information that can be quantified changes as technology progresses. In the 17th century, it was hard to quantify a view, one could paint it, but that wouldn’t be very accurate. In the 19th century cameras existed, but they still couldn’t capture the richness of the eye. In the 20th century, they could record video, but it still lacked the kind of depth a human walking around could get. In the 21st century, we can generate ever better representation of landscape changing in time using, e.g, drones, flying around and taking videos from different perspective, focusing on places where most change seems to be happening.

I think a large point of discontent between people is how accurate this quantifiable information is. At one extreme end someone might say “all science is bullshit, all news is fake, words can’t describe reality, maps are not territories” — That’s the crazy hippie hooked up on shrooms and meditation search for god/​enlightenment/​absolute-meaning. At the other extreme some might say “science can describe any process better than you or I could by interacting with it, filtering news will generate a picture of reality that’s more accurate than what we’d derive from our limited n=1 observations, words usually describe things pretty well, maps can be made to represent the territory with such accuracy that the distinction becomes pedantic to make” — That’s the crazy mathematician hooked upon on amphetamines and niche inconsistencies in book-length proofs, trying to get a fields medal.

Basically none lives at the extremes, and I think it’s hard to say where everyone is, because everyone has their own bias as to which type of quantifiable information is or isn’t accurate.

But, a large premise of why AI would be dangerous, is that it can build much better models of the world, without a lot of physical presence, via access to giant repositories of quantified information (i.e. the internet). So how accurate you think this information is matters, a lot.

II—The Value Of Quantifiable Information

Figuring out protein structures is very valuable, and it can be done using quantifiable information, AIs are much better at it than people. Figuring out how to massage away tightness causing pain in the neck is very valuable, and it can be done using fuzzy tactile information, trained masseurs are much better at it than massage chairs (presumably, even than massage chairs running a very fancy AI).

The question then arises of how valuable quantifiable information is. Or, more importantly, for what it is valuable. Some things, such as solving physics equations, are obviously solved by quantifiable information, other things, such as creating meaningful relationships with people, aren’t. The fuzzier you get with your goal, the less valuable quantifiable information is.

But there is an awkward problem space for which we don’t know the value of quantifiable information. This problem space includes many engineering problems as they related to creating things in the real world, It also includes most social “problems”, from the easy “how do I get this person to like me” to the hard “how do I create consensus among a nation of 300 million people”.

To think that quantifiable information has near-infinite value in these grey areas looks something like:

One day the AGI is throwing cupcakes at a puppy in a very precisely temperature-controlled room. A few days later, a civil war breaks out in Brazil. Then 2 million people die of an unusually nasty flu, and also it’s mostly the 2 million people who are best at handling emergencies but that won’t be obvious for a while, because of course first responders are exposed more than most. At some point there’s a Buzzfeed article on how, through a series of surprising accidents, a puppy-cupcake meme triggered the civil war in Brazil (Wentworth, 2022)

To think that quantifiable information has near-zero value in these grey area looks something like:

Drop those fucking book nerd, grab a coffee with the potential customer and sell our product. I’m the best fucking salesperson in this entire company, and I’ll tell it to you straight, all those hundreds of books on economics, psychology and sociology, utter fucking garbage. If they’d be worth anythin’ those knowitall would be making millions, but they ain’t. Midwit assholes like myself are the best at this, because we just talk to people, we use the monkey brain, we encourage them to use theirs, it’s 2% about the words you use, 20% about the posture you have, and 200% about what your eyes tell them. And if you’re thinking “those numbers ain’t adding up” then you’re missing the fucking point.

III—The Value Of Thinking

The other important question, regardless of what information can or will be quantified in the future, and how valuable or accurate it will be, is how much thinking can help you derive value from that information with better processing.

If most of us are 99% efficient at building relevant models from available information, and AI isn’t scary at all, because 1% extra efficiency is pointless.

If most of us are 0.0…99% efficient, then AI becomes a very scary idea.

I won’t object to the obvious fact that most people are inefficient at modeling most systems. The question is if this happens because of inherent limitations to cognition, or because most people have no reason for going through the arduous process of doing so for most systems. And, maybe more importantly, if there are a lot of high-ROI systems in need of modeling, or if our capacity to model collectively exceeded or closely matches the ROI we can get from available information.

Here, again, I think the intuitionist-vs-conceptualizer dividing line is very obvious. An intuitionist skims an article and 3 reddit comments and thinks they understand quantum physics about as well as anybody, a conceptualizer thinks that understanding quantum physics is reading every single article in the field and being able to properly grok every single equation, ideally being able to solve them on your own, without the author pointing you to the solution.

I’m a bit fuzzy about listing this one, because it seems to me that most AI alignment people actually fall towards the intuitionist side of the spectrum, i.e. they are the kind of wannabe polymaths that think they can skim through a field with 200 years worth of research and “get” most of the value without dedicating their lives to it. While most “normal” people fall towards the conceptualizer side of the spectrum, in that they are very afraid to “think for themselves” and go through raw data, even if they want to challenge or pursue an idea, they will find someone already challenging or pursuing that idea to do the thinking for them.

Indeed, the whole field of AI alignment as we know it today was formed by very capable intuitionists joining sparse ideas and information from many, at the time not-that-related fields, rather than by conceptualizers that dedicated their lives to studying the subject.

It might be that what I call an “intuitionist” here, i.e. someone that sparsely parses information and assumes they’ve reached a near-perfect understanding of available knowledge, can have two modes of thinking:

  1. Information mainly consists of different framings repeating the same core ideas. People are afraid to interact with it but, once you put your mind to it, it’s not that hard. It’s silly to think people dedicate their lives to single sub-sub-sub-fields of study, all they are doing is spinning in circle, you need to have the broad picture.

  2. Information is so hard to process that most people, even ones dedicating their lives to studying a subject, are too dumb to do it. I might be smart enough to figure this out better than them, but even I am probably missing out on most of the derivable values, and I couldn’t fathom to think about parsing all information available to us in something even vaguely efficient compare to the smartest human, or, worst, the smartest thing with human-like thinking abilities.

IV—The Intelligence Of Systems

We aren’t agents acting independently in the world, we’re part of a super-organism containing 8 billion brains, all working towards different goals, but with some shared objectives.

I haven’t heard many try to contemplate the intelligence of AIs against the intelligence of systems, and presumably, that’s because of an assumption that systems, compared to individuals, aren’t that smart. That systems derive more so as a compromise for distribution mechanical, rather than thinking work.

The two extremes here might look something like this:

The first nuclear fission bomb was invented by Oppenheimer and Von Neumann, with the former doing 80% of the work. The US government provided the offices and the manual labor, and some other bright physicists help hash out the details.

and

The first nuclear fission bomb was invented by the collective intelligence of most of Europe, Canada, the US, and millions of other people around the world. They each contributed their insight in hard-to-see ways toward the completion of the project. It might be fair to say that Oppenheimer did more than 1 /​ 5-billionths of the work, but this contribution might be equal to that of a steel-plant laborer in Kentucky that figured out a way for this factory to increase production by 1%, which eventually tricked down the supply chain. Oppenheimer being clever is about as critical to the success as that random laborer and millions of other “side characters” like him being clever.

Nobody takes either of these extremes literally, but people do vary widely on this axis, in part, this can be seen at a political level in how people want dessert to be shared.

How well the intelligence of agents adds up is a very relevant question for AI, even assuming all of the other 3 assumptions turn out to be in favor of the “scary AGI scenario that can maybe only be prevented with alignment research”.

— I’m not a fan of quantifying intelligence but for the sake of argument let’s do it —

If an AI system becomes scary once it’s a few times as “smart” as the smartest man along certain relevant axes, then we already have plenty of reasons to be scared, because it’s been beating people in solo thinking competitions left and right.

If, on the other hand, an AI gets scary when it’s dozens of times smarter than the added-up intelligence of all members of the US government, or of Google, or of the Chinese army, then that’s a bar requiring hundreds of thousands of times the amount of compute we have right now to cross.


Also, how smart you think human-based systems are is an important question in alignment, because if you do think of a government as a superintelligence, then it’s a pretty good example of how far alignment work can bring you.

V—Conclusion

I don’t mean to convince anyone to rethink their position on the above 4 axes, nor do I think this ontology is set in stone, it’s a random way to conceptualize the space of unstated axioms and I have no reason to think it comprehends everything nor that it’s better than any alternatives.

For me though, it was a very interesting framework because it helped me see how, with just a few slightly different and mainly arbitrarily-chosen priors on these topics, I might go from my current position to Eliezer Yudkowsky levels of doom saying, or to Steven Pinker level of not-caring. Whereas before both of their points of view seemed impenetrable to me.

Personally, I find the topic of debating how fast AI will advance and how “influential” and “agentic” it will be to be a red-herring in the safety debate. I think the more important question is how useful AI alignment research is.

In the 40s one could argue that “nuclear fission alignment” research might lead to shielding or controlled explosions that could help guard against or surgically direct the effects of fission bombs. The fact that this view would be completely misguided doesn’t seem obvious, given how much wo-wo people, even smart ones, attributed to harnessing the power of nuclei.

In the 40s, one could also argue that better diplomatic relationships and well-stocked bunkers deep underground could, combined, guard fairly well against any “x risk” from nuclear war. The fact that this is now taken for granted, and nuclear war is hardly viewed as an existential risk, also seems non-obvious at the time, where I can see even a mildly progress-minded engineer pointing out how in 20 more years of progress the explosions could be powerful enough to usurp Earth’s very crust. How the ease of creating bombs will place them in the hand of every nation, and then even in those of small rogue actors, making diplomacy impossible. And, how diplomacy is a tool for an age of ancient weapons, and won’t apply to governments and armies wielding powers at this scale.

I often see AI alignment research as being the equivalent of “nuclear fission alignment”. And I see the equivalent of diplomacy and bunkers as work that’s not even registering as “AI safety”: constructing less legible and more human systems of governance (e.g. restorative justice), re-writing security-critical applications in Rust or functional languages, and decoupling security-critical software from the internet and from other security-critical software.

Again, I don’t claim I can convince you of this view or that I’ve said anything that could constitute proof here. But I think this view becomes “obvious” if your priors are set at certain points along those 4 axes, as those a view that only AI alignment research might stand a chance of staying doom. So figuring out where your priors are is important in so far as I’d direct your decisions in terms of funding, personal precautions and overall outlook on the negative change AI might bring.

On the whole, though, I am fairly optimistic about the bunkers and diplomacy track, that is to say, treating AI as a hard engineering safety problem rather than a whole different magisterium. But, were such a thing as “nuclear fission alignment” research to exist and be popular in the 40s, even if it would have been misguided, it may have well led to us having better fission reactors and maybe even constructing fusion reactors sooner than we (hopefully) will.