Karma: 1,436
Page 1

Book Re­view: Con­scious­ness Explained

6 Mar 2018 3:32 UTC
101 points

Is this what FAI out­reach suc­cess looks like?

9 Mar 2018 13:12 UTC
53 points

How to get value learn­ing and refer­ence wrong

26 Feb 2019 20:22 UTC
40 points

Philos­o­phy as low-en­ergy approximation

5 Feb 2019 19:34 UTC
40 points
• Be­cause the noise usu­ally grows as the sig­nal does. Con­sider Moore’s law for tran­sis­tors per chip. Back when that num­ber was about 10^4, the stan­dard de­vi­a­tion was also small—say 10^3. Now that den­sity is 10^8, no chips are go­ing to be within a thou­sand tran­si­a­tors of each other, the stan­dard de­vi­a­tion is much big­ger (~10^7).

This means that if you’re try­ing to fit the curve, be­ing off by 10^5 is a small mis­take when pre­duct­ing cur­rent tran­sis­tor #, but a huge mis­take when pre­dict­ing past tran­sis­tor #. It’s not rare or im­plau­si­ble now to find a chip with 10^5 more tran­sis­tors, but back in the ’70s that differ­ence is a huge er­ror, im­pos­si­ble un­der an ac­cu­rate model of re­al­ity.

A ba­sic fit­ting func­tion, like least squares, doesn’t take this into ac­count. It will trade off tran­sis­tors now vs. tran­sis­tors in the past as if the mis­takes were of ex­actly equal im­por­tance. To do bet­ter you have to use some­thing like a chi squared method, where you ex­plic­itly weight the points differ­ently based on their var­i­ance. Or fit on a log scale us­ing the sim­ple method, which effec­tively as­sumes that the noise is pro­por­tional to the sig­nal.

• As some­one ba­si­cally think­ing alone (cue Ge­orge Thor­ough­good), I definitely would value more com­ments /​ dis­cus­sion. But if some­one has ac­cess to re­search re­treats where they’re talk­ing face to face as much as they want, I’m not sur­prised that they don’t post much.

Talk­ing is a lot eas­ier than writ­ing, and more im­me­di­ately re­ward­ing. It can be an ac­tivity among friends. It’s more high-band­width to have a dis­cus­sion face to face than it is over the in­ter­net. You can as­sume a lot more about your au­di­ence which saves a ton of effort. When talk­ing, you are more al­lowed to bul­lshit and guess and hand­wave and col­lab­o­ra­tively think with the other per­son, and still be in­ter­est­ing, wheras when writ­ing your au­di­ence usu­ally ex­pects you to be con­fi­dent in what you’ve writ­ten. Writ­ing is hard, read­ing is hard, un­der­stand­ing what peo­ple have writ­ten is harder than un­der­stand­ing what peo­ple have said and if you ask for clar­ifi­ca­tion that might get mi­s­un­der­stood in turn. This all ap­plies to com­ments al­most as much as to posts, par­tic­u­larly on tech­ni­cal sub­jects.

The two ad­van­tages writ­ing has for me is that I can com­mu­ni­cate in writ­ing with peo­ple who I couldn’t talk to, and that when you write some­thing out you get a good long chance to make sure it’s not stupid. When talk­ing it’s very easy to be con­vinc­ing, in­clud­ing to your­self, even when you’re con­fused. That’s a lot harder in writ­ing.

To en­courage more dis­cus­sion in writ­ing one could try to change the for­mat to re­duce these bar­ri­ers as much as pos­si­ble—try­ing to foster one-to-one or small group threads rather than one-to-many, forster­ing/​en­abling knowl­edge about other posters, cre­at­ing a con­text that al­lows for more guess­work and col­lab­o­ra­tive think­ing. Maybe one un­der­uti­lized tool on cur­rent LW is the ques­tion thread. Ques­tion threads are great ex­cuses to let peo­ple bul­lshit on a topic and then en­gage them in small group threads.

• In re­gards to your posts on AI safety, I have two opinions.

1: Maybe choose ti­tles that al­low the reader to figure out what they’re get­ting into. I can’t read ev­ery­thing, so I’d much rather read some­thing whose ti­tle lets me in­fer it’s about e.g. AI timelines. In gen­eral, I would like the point to be slightly more ob­vi­ous through­out.

2: Don’t stop post­ing, but slow down post­ing. Eliezer cheated in four ways. He did it for more time per day than you can af­ford, he was of­ten re­hash­ing ar­gu­ments he’d already put into text el­se­where, he rarely posted origi­nal tech­ni­cal work, and if he didn’t do a good job you wouldn’t know about him (while you have no such antropic se­lec­tion). Your AI posts of­ten raise ques­tions but only scrape the sur­face of an an­swer—I would rather read fewer but deeper posts.

• I’ve definitely no­ticed, in the very slow pro­cess of im­prov­ing my so­cial skills, that peo­ple (in gen­eral, and me in par­tic­u­lar) don’t give nearly enough com­pli­ments or praise rel­a­tive to the op­ti­mum. Past me just didn’t no­tice when there was a good place for a com­pli­ment—the skill that I im­proved was fun­da­men­tally a notic­ing skill. I also benefited a lot from un­der­stand­ing the psy­cholog­i­cal idea of val­i­da­tion—peo­ple want val­i­da­tion, not just praise for any old thing.

Re: work­ing on a spe­cific thing. I have more or less ac­cepted that the amount of praise one gets will not fit one’s needs. There’s a fame effect that causes a fat tail, and no par­tic­u­lar re­ward for merely try­ing, which I think is nec­es­sary given the num­ber of non-ex­perts and how easy it is to pro­duce bad work with­out notic­ing it. I definitely have to work on in­trin­sic mo­ti­va­tion.

A use­ful level distinction

24 Feb 2018 6:39 UTC
26 points
• Why do peo­ple re­act to fire alarms? It’s not just that they’re pub­lic—smoke is pub­lic too. One big fac­tor is that we’ve had re­act­ing to fire alarms drilled into us since child­hood, a policy prob­a­bly for­mu­lated af­ter a few in­ci­dents of chil­dren not re­spond­ing to fire alarms.

What this sug­gests is even if sig­nals are un­clear, maybe what we re­ally need is train­ing. If some semi-ar­bi­trary ad­vance is cho­sen, peo­ple may or may not change their be­hav­ior when that ad­vance oc­curs, de­pend­ing on whether they have been suc­cess­fully trained to be able to change their be­hav­ior.

On the other hand, we should already be work­ing on AI safety, and so at­tempt­ing to set up a fire alarm may be pointless—we need peo­ple to already be evac­u­at­ing and call­ing the fire­fighters.

Philos­o­phy of Num­bers (part 1)

2 Dec 2017 18:20 UTC
25 points
• Hon­estly? I feel like this same set of prob­lems gets re-solved a lot. I’m wor­ried that it’s a sign of ill health for the field.

I think we un­der­stand cer­tain tech­ni­cal as­pects of cor­rigi­bil­ity (in­differ­ence and CIRL), but have hit a brick wall in cer­tain other as­pects (things that re­quire so­phis­ti­cated “com­mon sense” about AIs or hu­mans to im­ple­ment, philo­soph­i­cal prob­lems about how to get an AI to solve philo­soph­i­cal prob­lems). I think this is part of what leads to re-tread­ing old ground when new peo­ple (or a per­son want­ing to ap­ply a new tool) try to work on AI safety.

On the other hand, I’m not sure if we’ve ex­hausted Con­crete Prob­lems yet. Yes, the an­swer is of­ten “just have so­phis­ti­cated com­mon sense,” but I think the value is in ex­plor­ing the prob­lems and gen­er­at­ing el­e­gant solu­tions so that we can deepen our un­der­stand­ing of value func­tions and agent be­hav­ior (like TurnTrout’s work on low-im­pact agents). In fact, Tom’s a co-au­thor on a very good toy prob­lems pa­per, many of which re­quire similar sort of one-off solu­tions that still might ad­vance our tech­ni­cal un­der­stand­ing of agents.

• It’s not called econ 101 be­cause it’s the only ma­te­rial you need.

En­gag­ing with pre­vi­ous work on the sub­ject is just like any other way of be­ing less wrong—if you’re already con­vinced you’re right, it feels like a te­dious box to be checked with no chance of in­fluenc­ing your con­clu­sions. Yes, there is some sig­nal­ling value to me, but the sig­nal­ling has value pre­cisely be­cause I as­sign high prob­a­bil­ity that there is rele­vant, im­por­tant prior work here. (EDIT: where “here” largely means the mon­e­tary policy bits, though I would still be pos­i­tively sig­nalled by some refer­ence-drop­ping on the cul­tural stuff).

Some Com­ments on Stu­art Arm­strong’s “Re­search Agenda v0.9”

8 Jul 2019 19:03 UTC
22 points
• The prob­lem, by which I mean the rea­son I would rather the scene had less of this mythic stuff, is that I sub­scribe to ab­solutely the mean­est, small­est type of cyn­i­cism: things peo­ple love are dan­ger­ous.

Take poli­ti­cal ar­gu­ments. Peo­ple love to have poli­ti­cal ar­gu­ments. If one con­sid­ers the com­mu­nity in the ab­stract, then poli­ti­cal ar­gu­ments are great for the com­mu­nity—look at how much more dis­cus­sion there is over on SSC these days!

I am, of course, as­sum­ing in this ex­am­ple that poli­ti­cal ar­gu­ments in in­ter­net com­ments are of lit­tle use. But I think there is a straight­for­ward cause: poli­ti­cal ar­gu­ments can be of lit­tle use be­cause peo­ple love them. If peo­ple didn’t love them, they would only have them when nec­es­sary.

Peo­ple love myths. Or at least most of them, some of the time. That’s why the myths you hear about aren’t se­lected for use­ful­ness.

• Nat­u­rally, that one pa­ram­e­ter has to be very pre­cise in or­der to work—if you have 1000 bits of data, the pa­ram­e­ter will take at least 1000 bits to write down.

Pretty cool scheme for fit­ting gen­eral scat­ter­plots. You could do the same in higher di­men­sions, but in­tu­itively it seems like you are ac­tu­ally anti-com­press­ing the data. Their point about not mea­sur­ing com­plex­ity by pa­ram­e­ter count is made.

Hu­mans aren’t agents—what then for value learn­ing?

15 Mar 2019 22:01 UTC
20 points
• ris­ing ways.

Here, you dropped this from the last bul­let point at the end :)

A very clear walk­through of full non­in­dex­i­cal con­di­tion­ing. Thanks! I think there’s still a big glar­ing warn­ing sign that this could be wrong, which is the mis­match with fre­quency (and, by ex­ten­sion, bet­ting). Prob­a­bil­ity is log­i­cally prior to fre­quency es­ti­ma­tion, but that doesn’t mean I think they’re de­cou­pled. If your “prob­a­bil­ity” has zero ap­pli­ca­tion be­cause your de­ci­sion the­ory uses “like­li­ness weights” calcu­lated an en­tirely differ­ent way, I think some­thing has gone very wrong.

I think if you’ve gone wrong some­where, it’s in try­ing to out­law state­ments of the form “it is Mon­day to­day.”

Sup­pose on Mon­day the ex­per­i­menters will give her a cookie af­ter she an­swers the ques­tion, and on Tues­day the ex­per­i­menters will give her ice cream. Do you re­ally want to out­law “in 5 min­utes I will get a cookie” as a valid thing to have be­liefs about?

In fact, I think you got it pre­cisely back­wards—prob­a­bil­ity dis­tri­bu­tions come from the as­signer’s state of in­for­ma­tion, and there­fore they must be built off of what the as­signer ac­tu­ally knows. I don’t have ac­cess to some True Mon­day De­tec­tor, I only have ac­cess to my in­ter­nal sense of time. “Now” is fun­da­men­tal, “Mon­day” is the higher level con­struct. Similarly, I don’t have an ab­solute po­si­tion sense—my prob­a­bil­ity dis­tri­bu­tion over things must always use rel­a­tive co­or­di­nates (even if it’s “rel­a­tive to the zero read­ing on this gauge here”) be­cause there are no ab­solute co­or­di­nates available to me. I don’t have ac­cess to my mys­ti­cal True Name, so I don’t know which of sev­eral du­pli­cates is the Real Me un­less I can de­scribe it in rel­a­tive terms like “the one who came first”—there­fore “me” is fun­da­men­tal, “the origi­nal Char­lie” is the higher-level con­struct.

Any­how, once you al­low tem­po­ral in­for­ma­tion you go back to try­ing to try­ing to figure out what your model should say when you de­mand a MEE con­straint on Mon­day vs. Tues­day.

• Us­ing only speed to eval­u­ate mod­els lands you with a lookup table that stores the re­sults you want. So you have to trade off speed and length: The speed­iest table of math re­sults has length O(N) and speed O(N) (maybe? Not at all sure). The short­est gen­eral pro­gram has length O(log(N)) and un­com­putably fast-grow­ing time. So if we think of a sep­a­rable cost func­tion F(length)+G(time), as long as F doesn’t grow su­per-fast nor G su­per-slow, even­tu­ally the lookup table will have bet­ter score than brute-force search.

Ideally you want to find some happy medium—this is re­mind­ing my of my old post on ap­prox­i­mate in­duc­tion.

• A lot of great points!

I think we can sep­a­rate the ar­gu­ments into about three camps, based on their pur­pose (though they (EDIT: whoops, for­got a don’t) don’t all cleanly sit in one camp):

• Ar­gu­ments why progress might be gen­er­ally fast: Ho­minid vari­a­tion, Brain scal­ing.

• Ar­gu­ments why a lo­cal ad­van­tage in AI might de­velop: In­tel­li­gence ex­plo­sion, One al­gorithm, Start­ing high, Awe­some AlphaZero.

• Ar­gu­ments why a lo­cal ad­van­tage in AI could cause a global dis­con­ti­nu­ity: De­ploy­ment scal­ing, Train vs. test, Pay­off thresh­olds, Hu­man-com­pe­ti­tion thresh­old, Un­even skills.

Th­ese facts need to work to­gether to get the the­sis of a sin­gle dis­rup­tive ac­tor to go through: you need there to be jumps in AI in­tel­li­gence, you need them to be fairly large even near hu­man in­tel­li­gence, and you need those in­creases to trans­late into a dis­con­tin­u­ous im­pact on the world. This frame­work helps me eval­u­ate ar­gu­ments and coun­ter­ar­gu­ments—for ex­am­ple, you don’t just ar­gue against Ho­minid vari­a­tion as show­ing that there will be a sin­gu­lar­ity, you ar­gue against its more limited im­pli­ca­tions as well.

Bits I didn’t agree with, and there­fore have lots to say about:

In­tel­li­gence Ex­plo­sion:

The coun­ter­ar­gu­ment seems pretty wishy-washy. You say: “Pos­i­tive feed­back loops are com­mon in the world, and very rarely move fast enough and far enough to be­come a dom­i­nant dy­namic in the world.” How com­mon? How rare? How dom­i­nant? Is global warm­ing a dom­i­nant pos­i­tive feed­back loop be­cause warm­ing leads to in­creased wa­ter in the at­mo­sphere which leads to more warm­ing, and it’s go­ing to have a big effect on the world? Or is it none of those, be­cause Earth won’t get all that much warmer, be­cause there are other well-un­der­stood effects keep­ing it in home­osta­sis?

More pre­cisely, I think the ar­gu­ment from refer­ence class that a pos­i­tive feed­back loop (or rather, the be­hav­ior that we ap­prox­i­mate as a pos­i­tive feed­back loop) will be limited in time and space is hardly an ar­gu­ment at all—it prac­ti­cally con­cedes that the feed­back loop ar­gu­ment works for the mid­dle of the three camps above, but merely points out that it’s not also an ar­gu­ment that in­tel­li­gence will be im­por­tant. A strong ar­gu­ment against the in­tel­li­gence feed­back hy­poth­e­sis has to ar­gue that a pos­i­tive feed­back loop is un­likely.

One can ob­vi­ously re­spond by em­pha­siz­ing that ob­jects in the refer­ence class you’ve cho­sen (e.g. tip­ping back too far in your chair and fal­ling) don’t gen­er­ally im­pact the world, and there­fore this is a refer­ence class ar­gu­ment against AI im­pact­ing the world. But AI is not drawn uniformly from this refer­ence class—the only rea­son we’re talk­ing about it is be­cause it’s been se­lected for the pos­si­bil­ity of im­pact­ing the world. Failure to ac­count for this se­lec­tion pres­sure is why the strength of the ar­gu­ment seemed to change upon break­ing it into parts vs. keep­ing it as a whole.

De­ploy­ment scal­ing:

We agree that slow de­ploy­ment speed can “smooth out” a dis­con­tin­u­ous jump in the state of the art into a con­tin­u­ous change in what peo­ple ac­tu­ally ex­pe­rience. You pre­sent each sec­tion as a stan­dalone ar­gu­ment, and so we also agree that fast de­ploy­ment speed alone does not im­ply dis­con­tin­u­ous jumps.

But I think keep­ing things so sep­a­rate misses the point that fast de­ploy­ment is among the nec­es­sary con­di­tions for a dis­con­tin­u­ous im­pact. There’s also risk, if we think of things sep­a­rately, of not re­mem­ber­ing these nec­es­sary con­di­tions when think­ing about his­tor­i­cal ex­am­ples. Like, we might look at the his­tory of drug de­vel­op­ment, where drug de­ploy­ment and adop­tion takes a few years, and costs fal­ling to al­low more peo­ple to ac­cess the treat­ment takes more years, and no­tice that even though there’s an a pri­ori ar­gu­ment for a dis­con­tin­u­ous jump in best prac­tices, peo­ples’ out­comes are con­tin­u­ous on the scale of sev­eral years. And then, if we’ve for­got­ten about other nec­es­sary fac­tors, we might just at­tribute this to some mys­te­ri­ous low base rate of dis­con­tin­u­ous jumps.

Pay­off thresh­olds:

The coun­ter­ar­gu­ment doesn’t re­ally hold to­gether. We start ex hy­poth­esi with some thresh­old effect in use­ful­ness (e.g. good enough boats let you reach an­other is­land). Then you say that it won’t cause a dis­con­ti­nu­ity in things we care about di­rectly; peo­ple might buy bet­ter boats, but be­cause of this pro­duc­ers will spend more effort mak­ing bet­ter boats and sell them more ex­pen­sively, so the “value per dol­lar” doesn’t jump. But this just as­sumes with­out jus­tifi­ca­tion that the pro­duc­tion eats up all the value—why can’t the buyer and the pro­ducer both cap­ture part of the in­crease in value? The only way the the­o­ret­i­cal ar­gu­ment seems to work is in equil­ibrium—which isn’t what we care about.

Nu­clear weapons are a neat ex­am­ple, but may be a mis­lead­ing one. Nu­clear weapons could have had half the yield, or twice the yield, with­out al­ter­ing much about when they were built—al­though if you’d dis­agree with this, I’d be in­ter­ested in in hear­ing about it. (Look­ing at your link, it seems like nu­clear weapons were in fact more ex­pen­sive per ton of TNT when they were first built—and yet they were built, which sug­gests there’s some­thing fishy about their fit to this ar­gu­ment).

Awe­some AlphaZero:

I think we can turn this into a more gen­eral the­sis: Re­search is of­ten lo­cal, and of­ten dis­con­tin­u­ous, and that’s im­por­tant in AI. Fields whose ad­vance seems con­tin­u­ous on the sev­eral-year scale may look jumpy on the six-month scale, and those jumps are usu­ally lo­cal­ized to one re­search team rather than dis­tributed. You can draw a straight line through a plot of e.g. perfor­mance of image-recog­ni­tion AI, but that doesn’t mean that at the times in be­tween the points there was a pro­gram with that in­ter­me­di­ate skill at image-recog­ni­tion. This is im­por­tant to AI if the scale of the jumps, and the time be­tween them, al­lows one team to jump through some re­gion (not nec­es­sar­ily a dis­con­ti­nu­ity) of large gain in effect and gain a global ad­van­tage.

The miss­ing ar­gu­ment about strat­egy:

There’s one pos­si­ble fac­tor con­tribut­ing to the like­li­hood of dis­con­ti­nu­ity that I didn’t see, and that’s the strate­gic one. If peo­ple think that there is some level of ad­van­tage in AI that will al­low them to have an im­por­tant global im­pact, then they might not re­lease their in­ter­me­di­ate work to the pub­lic (so that other groups don’t know their sta­tus, and so their work can’t be copied), cre­at­ing an ap­par­ent dis­con­ti­nu­ity when they de­cide to go pub­lic, even if 90% of their AI re­search would have got­ten them 90% of the tak­ing-over-the-world power.