Seek Fair Expectations of Others’ Models

Link post

Epistemic Sta­tus: Espe­cially about the fu­ture.

Re­sponse To (Eliezer Yud­kowsky): There’s No Fire Alarm for Ar­tifi­cial Gen­eral Intelligence

It’s long, but read the whole thing. Eliezer makes clas­sic Eliezer points in clas­sic Eliezer style. Even if you mostly know this already, there’s new points and it’s worth a re­fresher. I fully en­dorse his cen­tral point, and most of his sup­port­ing ar­gu­ments.

What Eliezer has rarely been, is fair. That’s part of what makes The Se­quences work. I want to dive in where he says he’s go­ing to be blunt – as if he’s ever not been – so you know it’s gonna be good:

Okay, let’s be blunt here. I don’t think most of the dis­course about AGI be­ing far away (or that it’s near) is be­ing gen­er­ated by mod­els of fu­ture progress in ma­chine learn­ing. I don’t think we’re look­ing at wrong mod­els; I think we’re look­ing at no mod­els.

I was once at a con­fer­ence where there was a panel full of fa­mous AI lu­mi­nar­ies, and most of the lu­mi­nar­ies were nod­ding and agree­ing with each other that of course AGI was very far off, ex­cept for two fa­mous AI lu­mi­nar­ies who stayed quiet and let oth­ers take the micro­phone.

I got up in Q&A and said, “Okay, you’ve all told us that progress won’t be all that fast. But let’s be more con­crete and spe­cific. I’d like to know what’s the least im­pres­sive ac­com­plish­ment that you are very con­fi­dent can­not be done in the next two years.”

There was a silence.

Even­tu­ally, two peo­ple on the panel ven­tured replies, spo­ken in a rather more ten­ta­tive tone than they’d been us­ing to pro­nounce that AGI was decades out. They named “A robot puts away the dishes from a dish­washer with­out break­ing them”, and Wino­grad schemas. Speci­fi­cally, “I feel quite con­fi­dent that the Wino­grad schemas—where we re­cently had a re­sult that was in the 50, 60% range—in the next two years, we will not get 80, 90% on that re­gard­less of the tech­niques peo­ple use.”

A few months af­ter that panel, there was un­ex­pect­edly a big break­through on Wino­grad schemas. The break­through didn’t crack 80%, so three cheers for wide cred­i­bil­ity in­ter­vals with er­ror mar­gin, but I ex­pect the pre­dic­tor might be feel­ing slightly more ner­vous now with one year left to go. (I don’t think it was the break­through I re­mem­ber read­ing about, but Rob turned up this pa­per as an ex­am­ple of one that could have been sub­mit­ted at most 44 days af­ter the above con­fer­ence and gets up to 70%.)

But that’s not the point. The point is the silence that fell af­ter my ques­tion, and that even­tu­ally I only got two replies, spo­ken in ten­ta­tive tones. When I asked for con­crete feats that were im­pos­si­ble in the next two years, I think that that’s when the lu­mi­nar­ies on that panel switched to try­ing to build a men­tal model of fu­ture progress in ma­chine learn­ing, ask­ing them­selves what they could or couldn’t pre­dict, what they knew or didn’t know. And to their credit, most of them did know their pro­fes­sion well enough to re­al­ize that fore­cast­ing fu­ture bound­aries around a rapidly mov­ing field is ac­tu­ally re­ally hard, that no­body knows what will ap­pear on arXiv next month, and that they needed to put wide cred­i­bil­ity in­ter­vals with very gen­er­ous up­per bounds on how much progress might take place twenty-four months’ worth of arXiv pa­pers later.

(Also, Demis Hass­abis was pre­sent, so they all knew that if they named some­thing in­suffi­ciently im­pos­si­ble, Demis would have Deep­Mind go and do it.)

The ques­tion I asked was in a com­pletely differ­ent genre from the panel dis­cus­sion, re­quiring a men­tal con­text switch: the as­sem­bled lu­mi­nar­ies ac­tu­ally had to try to con­sult their rough, scarce-formed in­tu­itive mod­els of progress in ma­chine learn­ing and figure out what fu­ture ex­pe­riences, if any, their model of the field definitely pro­hibited within a two-year time hori­zon. In­stead of, well, emit­ting so­cially de­sir­able ver­bal be­hav­ior meant to kill that darned hype about AGI and get some pre­dictable ap­plause from the au­di­ence.

I’ll be blunt: I don’t think the con­fi­dent long-ter­mism has been thought out at all. If your model has the ex­traor­di­nary power to say what will be im­pos­si­ble in ten years af­ter an­other one hun­dred and twenty months of arXiv pa­pers, then you ought to be able to say much weaker things that are im­pos­si­ble in two years, and you should have those pre­dic­tions queued up and ready to go rather than fal­ling into ner­vous silence af­ter be­ing asked.

In re­al­ity, the two-year prob­lem is hard and the ten-year prob­lem is laugh­ably hard. The fu­ture is hard to pre­dict in gen­eral, our pre­dic­tive grasp on a rapidly chang­ing and ad­vanc­ing field of sci­ence and en­g­ineer­ing is very weak in­deed, and it doesn’t per­mit nar­row cred­ible in­ter­vals on what can’t be done.

I agree that most dis­course around AGI is not based around mod­els of ma­chine learn­ing. I agree the AI lu­mi­nar­ies seem to not have given good rea­sons for their be­lief in AGI be­ing far away.

I also think Eliezer’s take on their re­sponse is en­tirely un­fair. Eliezer asks an ex­cel­lent ques­tion, but the re­sponse is quite rea­son­able.


It is en­tirely un­fair to ex­pect a queued up an­swer.

Sup­pose I have a perfectly de­tailed men­tal model for fu­ture AI de­vel­op­ments. If you ask, “What’s the chance ML can put away the dishes within two years?” I’ll need to do math, but: 3.74%.

Eliezer asks me his ques­tion.

Have I re­cently worked through that ques­tion? There are tons of ques­tions. Ques­tions about least im­pres­sive things in any refer­ence class are rare. Let alone this par­tic­u­lar class, con­fi­dence level and length of time.

So, no. Not queued up. The only rea­son to have this an­swer queued up is if some­one is go­ing to ask.

I did not an­ti­ci­pate that. I cer­tainly did not in the con­text of a listen­ing Den­nis Hass­abis. This is quite the iso­lated de­mand for rigor. I’ll need to think.


As­sume a men­tal model of AI de­vel­op­ment.

I am asked for the least im­pres­sive thing. To an­swer well, I must max­i­mize.

What must be con­sid­ered?

I need to de­cide what Eliezer meant by very con­fi­dent, and what other peo­ple will think it means, and what they think Eliezer meant. Three differ­ent val­ues. Very con­fi­dent as ac­tu­ally used varies wildly. Some­times it means 90% or less. Some­times it means 99% or more. Eliezer later claims I should know what my model definitely pro­hibits but asked about very con­fi­dent. There is dan­ger of mis­in­ter­pre­ta­tion.

I need to de­cide what im­pres­sive­ness means in con­text. Im­pres­sive­ness in terms of cur­rently per­ceived difficulty? In terms of the pub­lic or other re­searchers go­ing ‘oh, cool’? Im­pres­sive for a child? Some mix? Pre­sum­ably Eliezer means per­ceived difficulty but there is dan­ger of willful mis­in­ter­pre­ta­tion.

I need to query my model slash brain­storm for unim­pres­sive things I am very con­fi­dent can­not be done in two years. I ad­just for the Hass­abis effect that tasks I name will be ac­com­plished faster.

I find the least im­pres­sive thing.

Fi­nally I choose whether to an­swer.

This pro­cess isn’t fast even with a full model of fu­ture AI progress.


I have my an­swer: “A robot puts away the dishes from a dish­washer with­out break­ing them.”

Should I say it?

My up­side is limited.

It won’t be the least im­pres­sive thing not done within two years. Plenty of less im­pres­sive things might be done within two years. Some will and some won’t. My an­swer will seem lousy. The Hass­abis effect com­pounds this, since some things that did not hap­pen in two years might have if I’d named them.

Did Eliezer’s es­say ac­cel­er­ate work done on un­load­ing a dish­washer? On the Wino­grad schemas?

If I say some­thing that doesn’t hap­pen but comes close, such as get­ting 80% on the Wino­grad schemas if we get to 78%, I look wrong and lucky. If it doesn’t come close, I look fool­ish.

Also, hu­mans are ter­rible at cal­ibra­tion.

A true 98% con­fi­dent an­swer looks hope­lessly con­ser­va­tive to most peo­ple, and my off-the-cuff 98% con­fi­dent an­swer likely isn’t 98% re­li­able.

What­ever I name might hap­pen. How em­bar­rass­ing! Peo­ple will laugh, dis­trust and panic. My rep­u­ta­tion suffers.

The an­swer Eliezer gets might be im­por­tant. If I don’t want laugh­ter, dis­trust or panic, it might be bad if even one an­swer given hap­pens within two years.

In ex­change, Eliezer sees a greater will­ing­ness to an­swer, and I trans­fer in­tu­ition. Does that seem worth it?


Eliezer asked his ques­tion. What hap­pened?

The room fell silent. Mul­ti­ple lu­mi­nar­ies stopped to think. That seems ex­cel­lent. Pos­i­tive re­in­force­ment!

Two gave ten­ta­tive an­swers. Those an­swers seemed hon­est, rea­son­able and in­ter­est­ing. The ques­tion was hard. They were on the spot. Ten­ta­tive­ness was the op­po­site of a miss­ing mood. It prop­erly ex­presses low con­fi­dence. Pos­i­tive re­in­force­ment!

Others chose not to an­swer. Un­der the cir­cum­stances, I sym­pa­thize.

Th­ese ac­tions do not seem like strong ev­i­dence of a lack of mod­els, or of bad faith. This seems like what you hope to see.


I en­dorse Eliezer’s cen­tral points. There will be no fire alarm. We won’t have a clear sign AGI is com­ing soon un­til AGI ar­rives. We need to act now. It’s an emer­gency now. Public dis­cus­sion is mostly not based on mod­els of AI progress or con­crete short term pre­dic­tions.

Most dis­cus­sions of the fu­ture are not built around con­crete mod­els of the fu­ture. It is un­sur­pris­ing that AI dis­cus­sions fol­low this pat­tern.

One can still challenge that one needs short-term pre­dic­tions about AI progress to make long-term pre­dic­tions. It is not ob­vi­ous long-term pre­dic­tion is harder, or that it de­pends upon short-term pre­dic­tions. AGI might come purely from in­cre­men­tal ma­chine learn­ing progress. It might re­quire ma­jor in­sights. It might not come from ma­chine learn­ing.

There are many ways to then con­clude that AGI is far away where far away means decades out. Not that decades out is all that far away. Eliezer con­flat­ing the two should freak you out. AGI re­li­ably forty years away would be quite the fire alarm.

You could think there isn’t much ma­chine learn­ing progress, or that progress is near­ing its limits. You could think that progress will slow dra­mat­i­cally, per­haps be­cause prob­lems will get ex­po­nen­tially harder.

You might think prob­lems will get ex­po­nen­tially harder and re­sources spent will get ex­po­nen­tially larger too, so es­ti­mates of fu­ture progress move mostly in­so­far as they move the ex­pected growth rate of fu­ture in­vested re­sources.

You could think in­cen­tive gra­di­ents from build­ing more prof­itable or higher scor­ing AIs won’t lead to AGIs, even if other ma­chine learn­ing paths might work. Dario Amodei says OpenAI is “fol­low­ing the gra­di­ent.”

You could be­lieve our civ­i­liza­tion in­ca­pable of effort that does not fol­low in­cen­tive gra­di­ents.

You might think that our civ­i­liza­tion will col­lapse or cease to do such re­search be­fore it gets to AGI.

You could think build­ing an AGI would re­quire do­ing a thing, and our civ­i­liza­tion is no longer ca­pa­ble of do­ing things.

You could think that there is a lot of ma­chine learn­ing progress to be made be­tween here and AGI, such that even up­per bounds on cur­rent progress leave decades to go.

You could think that even a lot of the right ma­chine learn­ing progress won’t lead to AGI at all. Per­haps it is an en­tirely differ­ent type of thought. Per­haps it does not qual­ify as thought at all. We find more and more prac­ti­cal tasks that AIs can do with ma­chine learn­ing, but one can think both ‘there are a lot of tasks ma­chine learn­ing will learn to do’ and ‘ma­chine learn­ing in any­thing like its cur­rent form can­not, even fully de­vel­oped, do all tasks needed for AGI.’

And so on.

Most of those don’t pre­dict much about the next two years, other than a non-bind­ing up­per bound. With these mod­els, when ma­chine learn­ing does a new thing, that teaches us more about that prob­lem’s difficulty than about how fast ma­chine learn­ing is ad­vanc­ing.

Un­der these mod­els, Go and Heads Up No-Limit Hold ’Em Poker are eas­ier prob­lems than we ex­pected. We should up­date in fa­vor of well-defined ad­ver­sar­ial prob­lems with com­pact state ex­pres­sions but large branch trees be­ing eas­ier to solve. That doesn’t mean we shouldn’t up­date our progress es­ti­mates at all, but per­haps we shouldn’t up­date much.

This goes with ev­ery­thing AI learns to do ceas­ing to be AI.

Thus, one can rea­son­ably have a model where im­pres­sive­ness of short-term ad­vances does not much move our AGI timelines.

I saw an ex­cel­lent dou­ble crux on AI timelines, good enough to up­date me dra­mat­i­cally on the value of dou­ble crux and greatly en­rich my model of AI timelines. Two smart, highly in­vested peo­ple had given the prob­lem a lot of thought, and were do­ing their best to build mod­els and as­sign prob­a­bil­ities and seek truth. Many ques­tions came up. Short-term con­crete pre­dic­tions did not come up. At all.


That does not mean any of that is what is hap­pen­ing.

I think mostly what Eliezer thinks is hap­pen­ing, is hap­pen­ing. Peo­ple’s in­cen­tive gra­di­ents on short term ques­tions say not to an­swer. Peo­ple’s in­cen­tive gra­di­ents on long term ques­tions say to have AGI be decades out. That’s mostly what they an­swer. Models might ex­ist, but why let them change your an­swer? If you an­swer AGI is near and it doesn’t hap­pen you look fool­ish. If you an­swer AGI is near and it hap­pens, who cares what you said?

When asked a ques­tion, good thinkers gen­er­ate as much model as they need. Less good thinkers, or the oth­er­wise mo­ti­vated, in­stead model of what it is in their in­ter­est to say.

Most peo­ple who say pro­duc­tive AI safety work can­not cur­rently be done have not spent two hours think­ing about what could cur­rently be done. Again, that’s true of all prob­lems. Most peo­ple never spend two hours think­ing about what could be done about any­thing. Ever. See Eliezer en­tire es­sen­tial se­quence (se­quence Y).

That is how some­one got so frus­trated with get­ting peo­ple to ac­tu­ally think about AI safety that he de­cided it would be eas­ier to get them to ac­tu­ally think in gen­eral.

To do that, it’s im­por­tant to be to­tally un­fair to not think­ing. Fol­low­ing in­cen­tive gra­di­ents and so­cial queues and go­ing around with in­con­sis­tent mod­els and not try­ing things for even five min­utes be­fore declar­ing them im­pos­si­ble won’t cut it and that is to­tally not OK.

He em­pha­sizes na­ture not grad­ing on a curve, and fails ev­ery­one. Hard. The Way isn’t just A Thing, it’s a nec­es­sary thing.

Then we re­al­ize that no, it’s way worse than that. Peo­ple are not only not fol­low­ing The Way. No one does the thing they are sup­pos­edly do­ing. The world is mad on a differ­ent level than in­ac­cu­rate mod­els with­out proper Bayesian up­dat­ing and not stop­ping to think or try for five min­utes once in their life let alone two hours. There are no mod­els any­where.

Fair­ness can’t always be a thing. Try­ing to make it a thing where it isn’t a thing tends to go quite badly.

Some­times, though, you still need fair­ness. Without it groups can’t get along. Without it you can’t co­op­er­ate. Without it we treat think­ing about a new and in­ter­est­ing ques­tion as ev­i­dence of a lack of think­ing.

Hold­ing ev­ery­one to heroic re­spon­si­bil­ity wins you few friends, in­fluences few peo­ple and drives you in­sane.


Where does that leave us? Be­sides the origi­nal take­away that There Is No Fire Alarm For Ar­tifi­cial Gen­eral In­tel­li­gence and we need to work on the prob­lem now? And your pe­ri­odic re­minder that peo­ple are crazy and the world is mad?

Microfoun­da­tions are great, but some use­ful mod­els don’t have them. It would be great if ev­ery­one had prob­a­bil­is­tic time dis­tri­bu­tions for ev­ery pos­si­ble event, but this is to­tally not rea­son­able, and to­tally not re­quired to have a valid opinion. Some ap­proaches an­swer some ques­tions but not oth­ers.

We must hold onto our high stan­dards for our­selves and those who opt into them. For oth­ers, we must think about cir­cum­stance and in­cen­tive, and stop at ‘tough, but fair.’

Pre­dic­tions are valuable. They are hard to do well and so­cially ex­pen­sive to do hon­estly. A cul­ture of stat­ing your prob­a­bil­ities upon re­quest is good. Bet­ting on your be­liefs is bet­ter. Part of that is un­der­stand­ing not ev­ery­one has thought through ev­ery­thing. And un­der­stand­ing ad­verse se­lec­tion and bad so­cial odds. And re­al­iz­ing some­times best guesses would get taken too se­ri­ously, or com­mit peo­ple to things. Some­times peo­ple need to speak ten­ta­tively. Or say “I don’t know.” Or say noth­ing.

Allies won’t always pon­der what you’re pon­der­ing. They aren’t perfectly rigor­ous thinkers. They don’t think hard for two hours about your prob­lem. They don’t of­ten make ex­traor­di­nary efforts.

Most of what they want will in­volve so­cial re­al­ity and in­cen­tive gra­di­ents and mud­dled think­ing. They’re do­ing it for the wrong rea­sons. They will of­ten be un­re­li­able and un­trust­wor­thy. They’re defect­ing con­stantly.

You go to war with the army you have.

We can’t af­ford to hold ev­ery­one to im­pos­si­ble stan­dards. Even hold­ing our­selves to im­pos­si­ble stan­dards re­quires psy­cholog­i­cally safe ways to do that.

When some­one gen­uinely thinks, and offers real an­swers, cheer that. Espe­cially an­swers against in­ter­est. They do the best they can. From an­other per­spec­tive they could ob­vi­ously do so much more, but one thing at a time.

Giv­ing them the right so­cial in­cen­tive gra­di­ent, even in a small way, mat­ters a lot.

Some­one is do­ing their best to break through the in­cen­tive gra­di­ents of so­cial re­al­ity.

We can work with that.