Seek Fair Expectations of Others’ Models

Link post

Epistemic Status: Especially about the future.

Response To (Eliezer Yudkowsky): There’s No Fire Alarm for Artificial General Intelligence

It’s long, but read the whole thing. Eliezer makes classic Eliezer points in classic Eliezer style. Even if you mostly know this already, there’s new points and it’s worth a refresher. I fully endorse his central point, and most of his supporting arguments.

What Eliezer has rarely been, is fair. That’s part of what makes The Sequences work. I want to dive in where he says he’s going to be blunt – as if he’s ever not been – so you know it’s gonna be good:

Okay, let’s be blunt here. I don’t think most of the discourse about AGI being far away (or that it’s near) is being generated by models of future progress in machine learning. I don’t think we’re looking at wrong models; I think we’re looking at no models.

I was once at a conference where there was a panel full of famous AI luminaries, and most of the luminaries were nodding and agreeing with each other that of course AGI was very far off, except for two famous AI luminaries who stayed quiet and let others take the microphone.

I got up in Q&A and said, “Okay, you’ve all told us that progress won’t be all that fast. But let’s be more concrete and specific. I’d like to know what’s the least impressive accomplishment that you are very confident cannot be done in the next two years.”

There was a silence.

Eventually, two people on the panel ventured replies, spoken in a rather more tentative tone than they’d been using to pronounce that AGI was decades out. They named “A robot puts away the dishes from a dishwasher without breaking them”, and Winograd schemas. Specifically, “I feel quite confident that the Winograd schemas—where we recently had a result that was in the 50, 60% range—in the next two years, we will not get 80, 90% on that regardless of the techniques people use.”

A few months after that panel, there was unexpectedly a big breakthrough on Winograd schemas. The breakthrough didn’t crack 80%, so three cheers for wide credibility intervals with error margin, but I expect the predictor might be feeling slightly more nervous now with one year left to go. (I don’t think it was the breakthrough I remember reading about, but Rob turned up this paper as an example of one that could have been submitted at most 44 days after the above conference and gets up to 70%.)

But that’s not the point. The point is the silence that fell after my question, and that eventually I only got two replies, spoken in tentative tones. When I asked for concrete feats that were impossible in the next two years, I think that that’s when the luminaries on that panel switched to trying to build a mental model of future progress in machine learning, asking themselves what they could or couldn’t predict, what they knew or didn’t know. And to their credit, most of them did know their profession well enough to realize that forecasting future boundaries around a rapidly moving field is actually really hard, that nobody knows what will appear on arXiv next month, and that they needed to put wide credibility intervals with very generous upper bounds on how much progress might take place twenty-four months’ worth of arXiv papers later.

(Also, Demis Hassabis was present, so they all knew that if they named something insufficiently impossible, Demis would have DeepMind go and do it.)

The question I asked was in a completely different genre from the panel discussion, requiring a mental context switch: the assembled luminaries actually had to try to consult their rough, scarce-formed intuitive models of progress in machine learning and figure out what future experiences, if any, their model of the field definitely prohibited within a two-year time horizon. Instead of, well, emitting socially desirable verbal behavior meant to kill that darned hype about AGI and get some predictable applause from the audience.

I’ll be blunt: I don’t think the confident long-termism has been thought out at all. If your model has the extraordinary power to say what will be impossible in ten years after another one hundred and twenty months of arXiv papers, then you ought to be able to say much weaker things that are impossible in two years, and you should have those predictions queued up and ready to go rather than falling into nervous silence after being asked.

In reality, the two-year problem is hard and the ten-year problem is laughably hard. The future is hard to predict in general, our predictive grasp on a rapidly changing and advancing field of science and engineering is very weak indeed, and it doesn’t permit narrow credible intervals on what can’t be done.

I agree that most discourse around AGI is not based around models of machine learning. I agree the AI luminaries seem to not have given good reasons for their belief in AGI being far away.

I also think Eliezer’s take on their response is entirely unfair. Eliezer asks an excellent question, but the response is quite reasonable.


It is entirely unfair to expect a queued up answer.

Suppose I have a perfectly detailed mental model for future AI developments. If you ask, “What’s the chance ML can put away the dishes within two years?” I’ll need to do math, but: 3.74%.

Eliezer asks me his question.

Have I recently worked through that question? There are tons of questions. Questions about least impressive things in any reference class are rare. Let alone this particular class, confidence level and length of time.

So, no. Not queued up. The only reason to have this answer queued up is if someone is going to ask.

I did not anticipate that. I certainly did not in the context of a listening Dennis Hassabis. This is quite the isolated demand for rigor. I’ll need to think.


Assume a mental model of AI development.

I am asked for the least impressive thing. To answer well, I must maximize.

What must be considered?

I need to decide what Eliezer meant by very confident, and what other people will think it means, and what they think Eliezer meant. Three different values. Very confident as actually used varies wildly. Sometimes it means 90% or less. Sometimes it means 99% or more. Eliezer later claims I should know what my model definitely prohibits but asked about very confident. There is danger of misinterpretation.

I need to decide what impressiveness means in context. Impressiveness in terms of currently perceived difficulty? In terms of the public or other researchers going ‘oh, cool’? Impressive for a child? Some mix? Presumably Eliezer means perceived difficulty but there is danger of willful misinterpretation.

I need to query my model slash brainstorm for unimpressive things I am very confident cannot be done in two years. I adjust for the Hassabis effect that tasks I name will be accomplished faster.

I find the least impressive thing.

Finally I choose whether to answer.

This process isn’t fast even with a full model of future AI progress.


I have my answer: “A robot puts away the dishes from a dishwasher without breaking them.”

Should I say it?

My upside is limited.

It won’t be the least impressive thing not done within two years. Plenty of less impressive things might be done within two years. Some will and some won’t. My answer will seem lousy. The Hassabis effect compounds this, since some things that did not happen in two years might have if I’d named them.

Did Eliezer’s essay accelerate work done on unloading a dishwasher? On the Winograd schemas?

If I say something that doesn’t happen but comes close, such as getting 80% on the Winograd schemas if we get to 78%, I look wrong and lucky. If it doesn’t come close, I look foolish.

Also, humans are terrible at calibration.

A true 98% confident answer looks hopelessly conservative to most people, and my off-the-cuff 98% confident answer likely isn’t 98% reliable.

Whatever I name might happen. How embarrassing! People will laugh, distrust and panic. My reputation suffers.

The answer Eliezer gets might be important. If I don’t want laughter, distrust or panic, it might be bad if even one answer given happens within two years.

In exchange, Eliezer sees a greater willingness to answer, and I transfer intuition. Does that seem worth it?


Eliezer asked his question. What happened?

The room fell silent. Multiple luminaries stopped to think. That seems excellent. Positive reinforcement!

Two gave tentative answers. Those answers seemed honest, reasonable and interesting. The question was hard. They were on the spot. Tentativeness was the opposite of a missing mood. It properly expresses low confidence. Positive reinforcement!

Others chose not to answer. Under the circumstances, I sympathize.

These actions do not seem like strong evidence of a lack of models, or of bad faith. This seems like what you hope to see.


I endorse Eliezer’s central points. There will be no fire alarm. We won’t have a clear sign AGI is coming soon until AGI arrives. We need to act now. It’s an emergency now. Public discussion is mostly not based on models of AI progress or concrete short term predictions.

Most discussions of the future are not built around concrete models of the future. It is unsurprising that AI discussions follow this pattern.

One can still challenge that one needs short-term predictions about AI progress to make long-term predictions. It is not obvious long-term prediction is harder, or that it depends upon short-term predictions. AGI might come purely from incremental machine learning progress. It might require major insights. It might not come from machine learning.

There are many ways to then conclude that AGI is far away where far away means decades out. Not that decades out is all that far away. Eliezer conflating the two should freak you out. AGI reliably forty years away would be quite the fire alarm.

You could think there isn’t much machine learning progress, or that progress is nearing its limits. You could think that progress will slow dramatically, perhaps because problems will get exponentially harder.

You might think problems will get exponentially harder and resources spent will get exponentially larger too, so estimates of future progress move mostly insofar as they move the expected growth rate of future invested resources.

You could think incentive gradients from building more profitable or higher scoring AIs won’t lead to AGIs, even if other machine learning paths might work. Dario Amodei says OpenAI is “following the gradient.”

You could believe our civilization incapable of effort that does not follow incentive gradients.

You might think that our civilization will collapse or cease to do such research before it gets to AGI.

You could think building an AGI would require doing a thing, and our civilization is no longer capable of doing things.

You could think that there is a lot of machine learning progress to be made between here and AGI, such that even upper bounds on current progress leave decades to go.

You could think that even a lot of the right machine learning progress won’t lead to AGI at all. Perhaps it is an entirely different type of thought. Perhaps it does not qualify as thought at all. We find more and more practical tasks that AIs can do with machine learning, but one can think both ‘there are a lot of tasks machine learning will learn to do’ and ‘machine learning in anything like its current form cannot, even fully developed, do all tasks needed for AGI.’

And so on.

Most of those don’t predict much about the next two years, other than a non-binding upper bound. With these models, when machine learning does a new thing, that teaches us more about that problem’s difficulty than about how fast machine learning is advancing.

Under these models, Go and Heads Up No-Limit Hold ’Em Poker are easier problems than we expected. We should update in favor of well-defined adversarial problems with compact state expressions but large branch trees being easier to solve. That doesn’t mean we shouldn’t update our progress estimates at all, but perhaps we shouldn’t update much.

This goes with everything AI learns to do ceasing to be AI.

Thus, one can reasonably have a model where impressiveness of short-term advances does not much move our AGI timelines.

I saw an excellent double crux on AI timelines, good enough to update me dramatically on the value of double crux and greatly enrich my model of AI timelines. Two smart, highly invested people had given the problem a lot of thought, and were doing their best to build models and assign probabilities and seek truth. Many questions came up. Short-term concrete predictions did not come up. At all.


That does not mean any of that is what is happening.

I think mostly what Eliezer thinks is happening, is happening. People’s incentive gradients on short term questions say not to answer. People’s incentive gradients on long term questions say to have AGI be decades out. That’s mostly what they answer. Models might exist, but why let them change your answer? If you answer AGI is near and it doesn’t happen you look foolish. If you answer AGI is near and it happens, who cares what you said?

When asked a question, good thinkers generate as much model as they need. Less good thinkers, or the otherwise motivated, instead model of what it is in their interest to say.

Most people who say productive AI safety work cannot currently be done have not spent two hours thinking about what could currently be done. Again, that’s true of all problems. Most people never spend two hours thinking about what could be done about anything. Ever. See Eliezer entire essential sequence (sequence Y).

That is how someone got so frustrated with getting people to actually think about AI safety that he decided it would be easier to get them to actually think in general.

To do that, it’s important to be totally unfair to not thinking. Following incentive gradients and social queues and going around with inconsistent models and not trying things for even five minutes before declaring them impossible won’t cut it and that is totally not OK.

He emphasizes nature not grading on a curve, and fails everyone. Hard. The Way isn’t just A Thing, it’s a necessary thing.

Then we realize that no, it’s way worse than that. People are not only not following The Way. No one does the thing they are supposedly doing. The world is mad on a different level than inaccurate models without proper Bayesian updating and not stopping to think or try for five minutes once in their life let alone two hours. There are no models anywhere.

Fairness can’t always be a thing. Trying to make it a thing where it isn’t a thing tends to go quite badly.

Sometimes, though, you still need fairness. Without it groups can’t get along. Without it you can’t cooperate. Without it we treat thinking about a new and interesting question as evidence of a lack of thinking.

Holding everyone to heroic responsibility wins you few friends, influences few people and drives you insane.


Where does that leave us? Besides the original takeaway that There Is No Fire Alarm For Artificial General Intelligence and we need to work on the problem now? And your periodic reminder that people are crazy and the world is mad?

Microfoundations are great, but some useful models don’t have them. It would be great if everyone had probabilistic time distributions for every possible event, but this is totally not reasonable, and totally not required to have a valid opinion. Some approaches answer some questions but not others.

We must hold onto our high standards for ourselves and those who opt into them. For others, we must think about circumstance and incentive, and stop at ‘tough, but fair.’

Predictions are valuable. They are hard to do well and socially expensive to do honestly. A culture of stating your probabilities upon request is good. Betting on your beliefs is better. Part of that is understanding not everyone has thought through everything. And understanding adverse selection and bad social odds. And realizing sometimes best guesses would get taken too seriously, or commit people to things. Sometimes people need to speak tentatively. Or say “I don’t know.” Or say nothing.

Allies won’t always ponder what you’re pondering. They aren’t perfectly rigorous thinkers. They don’t think hard for two hours about your problem. They don’t often make extraordinary efforts.

Most of what they want will involve social reality and incentive gradients and muddled thinking. They’re doing it for the wrong reasons. They will often be unreliable and untrustworthy. They’re defecting constantly.

You go to war with the army you have.

We can’t afford to hold everyone to impossible standards. Even holding ourselves to impossible standards requires psychologically safe ways to do that.

When someone genuinely thinks, and offers real answers, cheer that. Especially answers against interest. They do the best they can. From another perspective they could obviously do so much more, but one thing at a time.

Giving them the right social incentive gradient, even in a small way, matters a lot.

Someone is doing their best to break through the incentive gradients of social reality.

We can work with that.