TurnTrout’s shortform feed

• My ma­ter­nal grand­father was the sci­en­tist in my fam­ily. I was young enough that my brain hadn’t de­cided to start do­ing its job yet, so my mem­o­ries with him are scat­tered and in­con­sis­tent and hard to re­trieve. But there’s no way that I could for­get all of the dumb jokes he made; how we’d play Scrab­ble and he’d (al­most surely) pre­tend to lose to me; how, ev­ery time he got to see me, his eyes would light up with boy­ish joy.

My great­est re­gret took place in the sum­mer of 2007. My fam­ily cel­e­brated the first day of the school year at an all-you-can-eat buf­fet, deli­cious food stacked high as the eye could fathom un­der lights of green, red, and blue. After a par­tic­u­larly sa­vory meal, we made to leave the sur­round­ing mall. My grand­father asked me to walk with him.

I was a child who thought to avoid be­ing seen too close to un­cool adults. I wasn’t think­ing. I wasn’t think­ing about hear­ing the crack­ing sound of his skull against the ground. I wasn’t think­ing about turn­ing to see his poorly con­gealed blood flow­ing from his fore­head out onto the floor. I wasn’t think­ing I would ner­vously watch him bleed for long min­utes while shield­ing my seven-year-old brother from the sight. I wasn’t think­ing that I should go visit him in the hos­pi­tal, be­cause that would be scary. I wasn’t think­ing he would die of a stroke the next day.

I wasn’t think­ing the last thing I would ever say to him would be “no[, I won’t walk with you]”.

Who could think about that? No, that was not a fore­see­able mis­take. Rather, I wasn’t think­ing about how pre­cious and short my time with him was. I wasn’t ap­pre­ci­at­ing how frag­ile my loved ones are. I didn’t re­al­ize that some­thing as in­con­se­quen­tial as an uniden­ti­fied ramp in a shop­ping mall was al­lowed to kill my grand­father.

I miss you, Joseph Matt.

• <3

• My mother told me my mem­ory was in­deed faulty. He never asked me to walk with him; in­stead, he asked me to hug him dur­ing din­ner. I said I’d hug him “to­mor­row”.

But I did, ap­par­ently, want to see him in the hos­pi­tal; it was my mother and grand­mother who de­cided I shouldn’t see him in that state.

• Gone, but never for­got­ten.

• Thank you for shar­ing.

• While read­ing Fo­cus­ing to­day, I thought about the book and won­dered how many ex­er­cises it would have. I felt a twinge of aver­sion. In keep­ing with my goal of in­creas­ing in­ter­nal trans­parency, I said to my­self: “I ex­plic­itly and con­sciously no­tice that I felt averse to some as­pect of this book”.

I then Fo­cused on the aver­sion. Turns out, I felt a lit­tle bit dis­gusted, be­cause a part of me rea­soned thusly:

If the book does have ex­er­cises, it’ll take more time. That means I’m spend­ing read­ing time on things that aren’t math text­books. That means I’m slow­ing down.

(Tran­scrip­tion of a deeper Fo­cus­ing on this rea­son­ing)

I’m afraid of be­ing slow. Part of it is surely the psy­cholog­i­cal rem­nants of the RSI I de­vel­oped in the sum­mer of 2018. That is, slow­ing down is now emo­tion­ally as­so­ci­ated with dis­abil­ity and frus­tra­tion. There was a pe­riod of me­te­oric progress as I started read­ing text­books and do­ing great re­search, and then there was pain. That pain struck even when I was just try­ing to take care of my­self, sleep, open doors. That pain then left me on the floor of my apart­ment, star­ing at the ceiling, des­per­ately will­ing my hands to just get bet­ter. They didn’t (for a long while), so I just lay there and cried. That was slow, and it hurt. No re­views, no posts, no typ­ing, no cod­ing. No writ­ing, slow read­ing. That was slow, and it hurt.

Part of it used to be a sense of “I need to catch up and learn these other sub­jects which [Eliezer /​ Paul /​ Luke /​ Nate] already know”. Through in­ter­nal dou­ble crux, I’ve nearly erad­i­cated this line of think­ing, which is nei­ther helpful nor rele­vant nor con­ducive to ex­cit­edly learn­ing the beau­tiful set­tled sci­ence of hu­man­ity. Although my most re­cent post touched on im­pos­tor syn­drome, that isn’t re­ally a thing for me. I feel rea­son­ably se­cure in who I am, now (al­though part of me wor­ries that oth­ers wrongly view me as an im­pos­tor?).

How­ever, I mostly just want to feel fast, effi­cient, and swift again. I some­times feel like I’m in a race with Alex, and I feel like I’m los­ing.

• Listen­ing to Eneasz Brod­ski’s ex­cel­lent read­ing of Crys­tal So­ciety, I no­ticed how cu­ri­ous I am about how AGI will end up work­ing. How are we ac­tu­ally go­ing to do it? What are those in­sights? I want to un­der­stand quite badly, which I didn’t re­al­ize un­til ex­pe­rienc­ing this (so far) in­tel­li­gently writ­ten story.

Similarly, how do we ac­tu­ally “al­ign” agents, and what are good frames for think­ing about that?

Here’s to hop­ing we don’t sate the former cu­ri­os­ity too early.

• I passed a home­less man to­day. His face was wracked in pain, body rock­ing back and forth, eyes clenched shut. A dirty sign lay for­got­ten on the ground: “very hun­gry”.

This man was once a child, with par­ents and friends and dreams and birth­day par­ties and maybe siblings he’d get in ar­gu­ments with and snow days he’d hope for.

And now he’s just hurt­ing.

And now I can’t help him with­out aban­don­ing oth­ers. So he’s still hurt­ing. Right now.

Real­ity is still al­lowed to make this hap­pen. This is wrong. This has to change.

• How would you help this man, if hav­ing to aban­don oth­ers in or­der to do so were not a con­cern? (Let us as­sume that some­one else—some­one whose com­pe­tence you fully trust, and who will do at least as good a job as you will—is go­ing to take care of all the stuff you feel you need to do.)

What is it you had in mind to do for this fel­low—speci­fi­cally, now—that you can’t (due to those other obli­ga­tions)?

• Sup­pose I ac­tu­ally cared about this man with the in­ten­sity he de­served—imag­ine that he were my brother, father, or best friend.

The ob­vi­ous first thing to do be­fore in­ter­act­ing fur­ther is to buy him a good meal and a healthy helping of gro­ceries. Then, I need to figure out his deal. Is he hurt­ing, or is he also suffer­ing from men­tal ill­ness?

If the former, I’d go the more straight­for­ward route of befriend­ing him, helping him pur­chase a sharp busi­ness pro­fes­sional out­fit, teach­ing him to in­ter­view and pre­sent him­self with con­fi­dence, se­cure an apart­ment, and find a job.

If the lat­ter, this gets trick­ier. I’d still try and befriend him (con­sis­tently be­ing a source of cheer­ful con­ver­sa­tion and deli­cious food would prob­a­bly help), but he might not be will­ing or able to get the help he needs, and I wouldn’t have the le­gal right to force him. My best bet might be to en­list the help of a psy­cholog­i­cal pro­fes­sional for these in­ter­ac­tions. If this doesn’t work, my first thought would be to in­fluence the lo­cal gov­ern­ment to get the broader prob­lem fixed (I’d spend at least an hour con­sid­er­ing other plans be­fore pro­ceed­ing fur­ther, here). Real­is­ti­cally, there’s likely a lot of pres­sure in this di­rec­tion already, so I’d need to find an an­gle from which few oth­ers are push­ing or pul­ling where I can make a differ­ence. I’d have to plot out the rele­vant poli­ti­cal forces, study ac­counts of suc­cess­ful past lob­by­ing, pin­point the peo­ple I need on my side, and then tar­get my in­fluenc­ing ac­cord­ingly.

(All of this is with­out spend­ing time look­ing at birds-eye re­search and case stud­ies of poverty re­duc­tion; as­sume coun­ter­fac­tu­ally that I in­cor­po­rate any ob­vi­ous im­prove­ments to these plans, be­cause I’d care about him and ded­i­cate more than like 4 min­utes of thought).

• Well, a num­ber of ques­tions may be asked here (about desert, about cau­sa­tion, about au­ton­omy, etc.). How­ever, two seem rele­vant in par­tic­u­lar:

First, it seems as if (in your lat­ter sce­nario) you’ve ar­rived (ten­ta­tively, yes, but not at all un­rea­son­ably!) at a plan in­volv­ing sys­temic change. As you say, there is quite a bit of effort be­ing ex­pended on this sort of thing already, so, at the mar­gin, any effec­tive efforts on your part would likely be both high-level and aimed in an at-least-some­what-un­usual di­rec­tion.

… yet isn’t this what you’re already do­ing?

Se­cond, and un­re­lat­edly… you say:

Sup­pose I ac­tu­ally cared about this man with the in­ten­sity he de­served—imag­ine that he were my brother, father, or best friend.

Yet it seems to me that, em­piri­cally, most peo­ple do not ex­pend the level of effort which you de­scribe, even for their siblings, par­ents, or close friends. Which is to say that the level of emo­tional and prac­ti­cal in­vest­ment you pro­pose to make (in this hy­po­thet­i­cal situ­a­tion) is, ac­tu­ally, quite a bit greater than that which most peo­ple in­vest in their fam­ily mem­bers or close friends.

The ques­tion, then, is this: do you cur­rently make this de­gree of in­vest­ment (emo­tional and prac­ti­cal) in your ac­tual siblings, par­ents, and close friends? If so—do you find that you are un­usual in this re­gard? If not—why not?

• … yet isn’t this what you’re already do­ing?

I work on tech­ni­cal AI al­ign­ment, so some of those I help (in ex­pec­ta­tion) don’t even ex­ist yet. I don’t view this as what I’d do if my top pri­or­ity were helping this man.

The ques­tion, then, is this: do you cur­rently make this de­gree of in­vest­ment (emo­tional and prac­ti­cal) in your ac­tual siblings, par­ents, and close friends? If so—do you find that you are un­usual in this re­gard? If not—why not?

That’s a good ques­tion. I think the an­swer is yes, at least for my close fam­ily. Re­cently, I’ve ex­pended sub­stan­tial en­ergy per­suad­ing my fam­ily to sign up for cry­on­ics with me, win­ning over my mother, brother, and (I an­ti­ci­pate) my aunt. My father has lin­ger­ing con­cerns which I think he wouldn’t have upon suffi­cient re­flec­tion, so I’ve de­signed a similar plan for en­sur­ing he makes what I per­ceive to be the cor­rect, op­tion-pre­serv­ing choice. For ex­am­ple, I made sig­nifi­cant tar­geted dona­tions to effec­tive char­i­ties on his be­half to offset (what he per­ceives as) a con­sid­er­able draw­back of cry­on­ics: his in­abil­ity to also be an or­gan donor.

A uni­verse in which hu­man­ity wins but my dad is gone would be quite sad to me, and I’ll take what­ever steps nec­es­sary to min­i­mize the chances of that.

I don’t know how un­usual this is. This re­minds me of the rele­vant Harry-Quir­rell ex­change; most peo­ple seem beaten-down and hurt them­selves, and I can imag­ine a world in which peo­ple are in bet­ter places and go­ing to greater lengths for those they love. I don’t know if this is ac­tu­ally what would make more peo­ple go to these lengths (just an im­me­di­ate im­pres­sion).

• I pre­dict that this com­ment is not helpful to Turn­trout.

• Good, origi­nal think­ing feels pre­sent to me—as if men­tal re­sources are well-al­lo­cated.

The thought which prompted this:

Sure, if peo­ple are asked to solve a prob­lem and say they can’t af­ter two sec­onds, yes—make fun of that a bit. But that two sec­onds cov­ers more ground than you might think, due to Sys­tem 1 pre­com­pu­ta­tion.

Re­act­ing to a bit of HPMOR here, I no­ticed some­thing felt off about Harry’s re­ply to the Fred/​Ge­orge-tried-for-two-sec­onds thing. Hav­ing a bit of ex­pe­rience notic­ing con­fus­ing, I did not think “I no­tice I am con­fused” (al­though this can be use­ful). I did not think “Eliezer prob­a­bly put thought into this”, or “Harry is kinda dumb in cer­tain ways—so what if he’s a bit un­fair here?”. Without re­sur­fac­ing, or dis­trac­tion, or won­der­ing if this train of thought is more fun than just read­ing fur­ther, I just thought about the ob­ject-level ex­change.

Peo­ple need to al­lo­cate men­tal en­ergy wisely; this goes far be­yond fo­cus­ing on im­por­tant tasks. Your ex­ist­ing men­tal skil­lsets already op­ti­mize and auto-pi­lot cer­tain men­tal mo­tions for you, so you should al­lo­cate less de­liber­a­tion to them. In this case, the con­fu­sion-notic­ing mod­ule was honed; by not wor­ry­ing about how well I no­ticed con­fu­sion, I was able to quickly have an origi­nal thought.

When thought pro­cesses de­rail or brain­storm­ing ses­sions bear no fruit, in­ap­pro­pri­ate al­lo­ca­tion may be to blame. For ex­am­ple, if you’re anx­ious, you’re in­ter­rupt­ing the ac­tual thoughts with “what-if”s.

To con­trast, non-pre­sent think­ing feels like a con­trol­ler di­rect­ing thoughts to go from here to there: do this and then, check that, come up for air over and over… Pre­sent think­ing is a stream of un­in­ter­rupted strikes, the train of thought chug­ging along with­out self-con­scious­ness. Mov­ing, in­stead of think­ing about mov­ing while mov­ing.

I don’t know if I’ve nailed down the thing I’m try­ing to point at yet.

• Sure, if peo­ple are asked to solve a prob­lem and say they can’t af­ter two sec­onds, yes—make fun of that a bit. But that two sec­onds cov­ers more ground than you might think, due to Sys­tem 1 pre­com­pu­ta­tion.

Ex­pand­ing on this, there is an as­pect of Ac­tu­ally Try­ing that is prob­a­bly miss­ing from S1 pre­com­pu­ta­tion. So, maybe the two-sec­ond “at­tempt” is ac­tu­ally use­less for most peo­ple be­cause sub­con­scious de­liber­a­tion isn’t hardass enough at giv­ing its all, at mak­ing des­per­ate and ex­traor­di­nary efforts to solve the prob­lem.

My life has got­ten a lot more in­sane over the last two years. How­ever, it’s also got­ten a lot more won­der­ful, and I want to take time to share how thank­ful I am for that.

Be­fore, life felt like… a thing that you ex­pe­rience, where you score points and ac­co­lades and check boxes. It felt kinda fake, but parts of it were nice. I had this nice cozy lit­tle box that I lived in, a men­tal cage cir­cum­scribing my en­tire life. To­day, I feel (much more) free.

I love how cu­ri­ous I’ve be­come, even about “un­so­phis­ti­cated” things. Near dusk, I walked the win­ter won­der­land of Og­den, Utah with my aunt and un­cle. I spot­ted this gor­geous red or­na­ment hang­ing from a tree, with a hunk of snow stuck to it at north-east ori­en­ta­tion. This snow had ap­par­ently de­cided to defy grav­ity. I just stopped and stared. I was so con­fused. I’d kinda guessed that the dry snow must in­duce a huge co­effi­cient of static fric­tion, hence the win­ter won­der­land. But that didn’t suffice to ex­plain this. I bounded over and saw the smooth sur­face was iced, so maybe part of the snow melted in the mid­day sun, froze as evening ad­vanced, and then the part-ice part-snow chunk stuck much more solidly to the or­na­ment.

Maybe that’s right, and maybe not. The point is that two years ago, I’d have thought this was just “how the world worked”, and it was up to physi­cists to un­der­stand the de­tails. What­ever, right? But now, I’m this starry-eyed kid in a se­cret shop full of won­der­ful se­crets. Some se­crets are already un­der­stood by some peo­ple, but not by me. A few se­crets I am the first to un­der­stand. Some se­crets re­main un­known to all. All of the se­crets are en­tic­ing.

My life isn’t always like this; some days are a bit gray and drain­ing. But many days aren’t, and I’m so happy about that.

So­cially, I feel more fas­ci­nated by peo­ple in gen­eral, more ea­ger to hear what’s go­ing on in their lives, more cu­ri­ous what it feels like to be them that day. In par­tic­u­lar, I’ve fallen in love with the ra­tio­nal­ist and effec­tive al­tru­ist com­mu­ni­ties, which was to­tally a thing I didn’t even know I des­per­ately wanted un­til I already had it in my life! There are so many kind, smart, and car­ing peo­ple, in­side many of whom burns a similarly in­tense drive to make the fu­ture nice, no mat­ter what. Even though I’m es­tranged from the phys­i­cal com­mu­nity much of the year, I feel less alone: there’s a home for me some­where.

Pro­fes­sion­ally, I’m work­ing on AI al­ign­ment, which I think is cru­cial for mak­ing the fu­ture nice. Two years ago, I felt pretty sidelined—I hadn’t met the bars I thought I needed to meet in or­der to do Im­por­tant Things, so I just planned for a nice, quiet, re­spon­si­ble, nor­mal life, do­ing lit­tle kind­nesses. Surely the writ­ers of the uni­verse’s script would make sure things turned out OK, right?

I feel in the game now. The game can be daunt­ing, but it’s also thrilling. It can be scary, but it’s im­por­tant. It’s some­thing we need to play, and win. I feel that viscer­ally. I’m fight­ing for some­thing im­por­tant, with ev­ery in­ten­tion of win­ning.

I re­ally wish I had the time to hear from each and ev­ery one of you. But I can’t, so I do what I can: I wish you a very happy Thanks­giv­ing. :)

• Yes­ter­day, I put the finish­ing touches on my chef d’œu­vre, a se­ries of im­por­tant safety-rele­vant proofs I’ve been striv­ing for since early June. Strangely, I felt a great ex­haus­tion come over me. Th­ese proofs had been my ob­ses­sion for so long, and now—now, I’m done.

I’ve had this feel­ing be­fore; three years ago, I stud­ied fer­vently for a Google in­ter­view. The literal mo­ment the in­ter­view con­cluded, a fever over­took me. I was sick for days. All the stress and ex­pec­ta­tion and readi­ness-to-fight which had been pent up, re­leased.

I don’t know why this hap­pens. But right now, I’m still a lit­tle tired, even af­ter get­ting a good night’s sleep.

• This hap­pens to me some­times. I know sev­eral peo­ple who have this hap­pen at the end of a Uni semester. Hope you can get some rest.

• Sup­pose you could choose how much time to spend at your lo­cal library, dur­ing which:

• you do not age. Time stands still out­side; no one en­ters or ex­its the library (which is oth­er­wise de­void of peo­ple).

• you don’t need to sleep/​eat/​get sun­light/​etc

• you can use any com­put­ers, but not ac­cess the in­ter­net or oth­er­wise bring in ma­te­ri­als with you

• you can’t leave be­fore the re­quested time is up

Sup­pose you don’t go crazy from soli­tary con­fine­ment, etc. Re­mem­ber that value drift is a po­ten­tial thing.

How long would you ask for?

• How good are the com­put­ers?

• Win­dows ma­chines circa ~2013. Let’s say 128GB hard drives which mag­i­cally never fail, for 10 PCs.

• Prob­a­bly 3-5 years then. I’d use it to get a stronger foun­da­tion in low level pro­gram­ming skills, math and physics. The limit­ing fac­tors would be en­ter­tain­ment in the library to keep me sane and the in­evitable degra­da­tion of my so­cial skills from so much spent time alone.

• Judg­ment in Man­age­rial De­ci­sion Mak­ing says that (sub­con­scious) mis­ap­pli­ca­tion of e.g. the rep­re­sen­ta­tive­ness heuris­tic causes in­sen­si­tivity to base rates and to sam­ple size, failure to rea­son about prob­a­bil­ities cor­rectly, failure to con­sider re­gres­sion to the mean, and the con­junc­tion fal­lacy. My model of this is that rep­re­sen­ta­tive­ness /​ availa­bil­ity /​ con­fir­ma­tion bias work off of a mechanism some­what similar to at­ten­tion in neu­ral net­works: due to how the brain performs time-limited search, more salient/​re­cent mem­o­ries get pri­ori­tized for re­call.

The availa­bil­ity heuris­tic goes wrong when our saliency-weighted per­cep­tions of the fre­quency of events is a bi­ased es­ti­ma­tor of the real fre­quency, or maybe when we just hap­pen to be ex­trap­o­lat­ing off of a very small sam­ple size. Con­cepts get in­ap­pro­pri­ately ac­ti­vated in our mind, and we there­fore rea­son in­cor­rectly. At­ten­tion also ex­plains an­chor­ing: you can more read­ily bring to mind things re­lated to your an­chor due to salience.

The case for con­fir­ma­tion bias seems to be a lit­tle more in­volved: first, we had evolu­tion­ary pres­sure to win ar­gu­ments, which means our search is meant to find sup­port­ive ar­gu­ments and avoid even sub­con­sciously sig­nal­ling that we are aware of the ex­is­tence of coun­ter­ar­gu­ments. This means that those sup­port­ive ar­gu­ments feel salient, and we (per­haps by “de­sign”) get to feel un­bi­ased—we aren’t con­sciously dis­card­ing ev­i­dence, we’re just fol­low­ing our nor­mal search/​rea­son­ing pro­cess! This is what our search al­gorithm feels like from the in­side.

This rea­son­ing feels clicky, but I’m just treat­ing it as an in­ter­est­ing per­spec­tive for now.

• With re­spect to the in­te­gers, 2 is prime. But with re­spect to the Gaus­sian in­te­gers, it’s not: it has fac­tor­iza­tion . Here’s what’s hap­pen­ing.

You can view com­plex mul­ti­pli­ca­tion as scal­ing and ro­tat­ing the com­plex plane. So, when we take our unit vec­tor 1 and mul­ti­ply by , we’re scal­ing it by and ro­tat­ing it coun­ter­clock­wise by :

This gets us to the pur­ple vec­tor. Now, we mul­ti­ply by , scal­ing it up by again (in green), and ro­tat­ing it clock­wise again by the same amount. You can even deal with the scal­ing and ro­ta­tions sep­a­rately (scale twice by , with zero net ro­ta­tion).

• I feel very ex­cited by the AI al­ign­ment dis­cus­sion group I’m run­ning at Ore­gon State Univer­sity. Three weeks ago, most at­ten­dees didn’t know much about “AI se­cu­rity mind­set”-ish con­sid­er­a­tions. This week, I asked the ques­tion “what, if any­thing, could go wrong with a su­per­hu­man re­ward max­i­mizer which is re­warded for pic­tures of smil­ing peo­ple? Don’t just fit a bad story to the re­ward func­tion. Think care­fully.”

There was some dis­cus­sion and ini­tial op­ti­mism, af­ter which some­one said “wait, those op­ti­mistic solu­tions are just the ones you’d pri­ori­tize! What’s that called, again?” (It’s called an­thro­po­mor­phic op­ti­mism)

I’m so proud.

• An ex­er­cise in the com­pan­ion work­book to the Feyn­man Lec­tures on Physics asked me to com­pute a rather ar­du­ous nu­mer­i­cal simu­la­tion. At first, this seemed like a “pass” in fa­vor of an ex­er­cise more amenable to an­a­lytic and con­cep­tual anal­y­sis; ar­ith­metic re­ally bores me. Then, I re­al­ized I was be­ing dumb—I’m a com­puter sci­en­tist.

Sud­denly, this ex­er­cise be­came very cool, as I quickly figured out the equa­tions and code, crunched the num­bers in an in­stant, and churned out a nice scat­ter­plot. This seems like a case where cross-do­main com­pe­tence is un­usu­ally helpful (al­though it’s not like I had to bust out any es­o­teric the­o­ret­i­cal CS knowl­edge). I’m won­der­ing whether this kind of thing will com­pound as I learn more and more ar­eas; whether pre­vi­ously ar­du­ous or difficult ex­er­cises be­come easy when at­tacked with well-honed tools and frames from other dis­ci­plines.

• Ear­lier to­day, I be­came cu­ri­ous why ex­trin­sic mo­ti­va­tion tends to pre­clude or de­crease in­trin­sic mo­ti­va­tion. This phe­nomenon is known as over­jus­tifi­ca­tion. There’s likely agreed-upon the­o­ries for this, but here’s some stream-of-con­scious­ness as I rea­son and read through sum­ma­rized ex­per­i­men­tal re­sults. (ETA: Looks like there isn’t con­sen­sus on why this hap­pens)

My first hy­poth­e­sis was that rec­og­niz­ing ex­ter­nal re­wards some­how pre­cludes ac­ti­va­tion of cu­ri­os­ity-cir­cuits in our brain. I’m imag­in­ing a kid en­grossed in a puz­zle. Then, they’re told that they’ll be given $10 upon com­ple­tion. I’m pre­dict­ing that the kid won’t be­come sig­nifi­cantly less en­gaged, which sur­prises me? third graders who were re­warded with a book showed more read­ing be­havi­our in the fu­ture, im­ply­ing that some re­wards do not un­der­mine in­trin­sic mo­ti­va­tion. Might this be be­cause the re­ward for read­ing is more read­ing, which doesn’t un­der­mine the in­trin­sic in­ter­est in read­ing? You aren’t look­ing for­ward to es­cap­ing the task, af­ter all. While the pro­vi­sion of ex­trin­sic re­wards might re­duce the de­sir­a­bil­ity of an ac­tivity, the use of ex­trin­sic con­straints, such as the threat of pun­ish­ment, against perform­ing an ac­tivity has ac­tu­ally been found to in­crease one’s in­trin­sic in­ter­est in that ac­tivity. In one study, when chil­dren were given mild threats against play­ing with an at­trac­tive toy, it was found that the threat ac­tu­ally served to in­crease the child’s in­ter­est in the toy, which was pre­vi­ously un­de­sir­able to the child in the ab­sence of threat. A few ex­per­i­men­tal sum­maries: 1 Re­searchers at South­ern Methodist Univer­sity con­ducted an ex­per­i­ment on 188 fe­male uni­ver­sity stu­dents in which they mea­sured the sub­jects’ con­tinued in­ter­est in a cog­ni­tive task (a word game) af­ter their ini­tial perfor­mance un­der differ­ent in­cen­tives. The sub­jects were di­vided into two groups. Mem­bers of the first group were told that they would be re­warded for com­pe­tence. Above-av­er­age play­ers would be paid more and be­low-av­er­age play­ers would be paid less. Mem­bers of the sec­ond group were told that they would be re­warded only for com­ple­tion. Their pay was scaled by the num­ber of rep­e­ti­tions or the num­ber of hours play­ing. After­wards, half of the sub­jects in each group were told that they over-performed, and the other half were told that they un­der-performed, re­gard­less of how well each sub­ject ac­tu­ally did. Mem­bers of the first group gen­er­ally showed greater in­ter­est in the game and con­tinued play­ing for a longer time than the mem­bers of the sec­ond group. “Over-perform­ers” con­tinued play­ing longer than “un­der-perform­ers” in the first group, but “un­der-perform­ers” con­tinued play­ing longer than “over-perform­ers” in the sec­ond group. This study showed that, when re­wards do not re­flect com­pe­tence, higher re­wards lead to less in­trin­sic mo­ti­va­tion. But when re­wards do re­flect com­pe­tence, higher re­wards lead to greater in­trin­sic mo­ti­va­tion. 2 Richard Tit­muss sug­gested that pay­ing for blood dona­tions might re­duce the sup­ply of blood donors. To test this, a field ex­per­i­ment with three treat­ments was con­ducted. In the first treat­ment, the donors did not re­ceive com­pen­sa­tion. In the sec­ond treat­ment, the donors re­ceived a small pay­ment. In the third treat­ment, donors were given a choice be­tween the pay­ment and an equiv­a­lent-val­ued con­tri­bu­tion to char­ity. None of the three treat­ments af­fected the num­ber of male donors, but the sec­ond treat­ment al­most halved the num­ber of fe­male donors. How­ever, al­low­ing the con­tri­bu­tion to char­ity fully elimi­nated this effect. From a glance at the Wikipe­dia page, it seems like there’s not re­ally ex­pert con­sen­sus on why this hap­pens. How­ever, ac­cord­ing to self-per­cep­tion the­ory, a per­son in­fers causes about his or her own be­hav­ior based on ex­ter­nal con­straints. The pres­ence of a strong con­straint (such as a re­ward) would lead a per­son to con­clude that he or she is perform­ing the be­hav­ior solely for the re­ward, which shifts the per­son’s mo­ti­va­tion from in­trin­sic to ex­trin­sic. This lines up with my un­der­stand­ing of self-con­sis­tency effects. • Virtue ethics seems like model-free con­se­quen­tial­ism to me. • Go­ing through an in­tro chem text­book, it im­me­di­ately strikes me how this should be as ap­peal­ing and mys­te­ri­ous as the al­chem­i­cal magic sys­tem of Ful­lmetal Al­chemist. “The law of equiv­a­lent ex­change” “con­ser­va­tion of en­ergy/​el­e­ments/​mass (the last two hold­ing only for nor­mal chem­i­cal re­ac­tions)”, etc. If only it were nat­u­ral to take joy in the merely real... • Have you been con­tin­u­ing your self-study schemes into realms be­yond math stuff? If so I’m in­ter­ested in both the mo­ti­va­tion and how it’s go­ing! I re­mem­ber hav­ing lit­tle in­ter­est in other non-physics sci­ence grow­ing up, but that was also be­fore I got good at learn­ing things and my en­joy­ment was based on how well it was pre­sented. • Yeah, I’ve read a lot of books since my re­views fell off last year, most of them still math. I wasn’t able to type re­li­ably un­til early this sum­mer, so my re­views kinda got de­railed. I’ve read Vi­sual Group The­ory, Un­der­stand­ing Ma­chine Learn­ing, Com­pu­ta­tional Com­plex­ity: A Con­cep­tual Per­spec­tive, In­tro­duc­tion to the The­ory of Com­pu­ta­tion, An Illus­trated The­ory of Num­bers, most of Tadel­lis’ Game The­ory, the be­gin­ning of Mul­ti­a­gent Sys­tems, parts of sev­eral graph the­ory text­books, and I’m go­ing through Munkres’ Topol­ogy right now. I’ve got­ten through the first fifth of the first Feyn­man lec­tures, which has given me an un­be­liev­able amount of mileage for gen­er­ally rea­son­ing about physics. I want to go back to my re­views, but I just have a lot of other stuff go­ing on right now. Also, I run into fewer ba­sic con­fu­sions than when I was just start­ing at math, so I gen­er­ally have less to talk about. I guess I could in­stead try and re-pre­sent the coolest con­cepts from the book. My “plan” is to keep learn­ing math un­til the low grad­u­ate level (I still need to at least do com­plex anal­y­sis, topol­ogy, field /​ ring the­ory, ODEs/​PDEs, and some­thing to shore up my atro­cious trig skills, and prob­a­bly more)[1], and then branch off into physics + a “softer” sci­ence (any­thing from microe­con to psy­chol­ogy). CS (“done”) → math → physics → chem → bio is the ma­jor track for the phys­i­cal sci­ences I have in mind, but that might change. I dunno, there’s just a lot of stuff I still want to learn. :) 1. I also still want to learn Bayes nets, cat­e­gory the­ory, get a much deeper un­der­stand­ing of prob­a­bil­ity the­ory, prov­abil­ity logic, and de­ci­sion the­ory. ↩︎ • Yay learn­ing all the things! Your re­views are fun, also com­pletely un­der­stand­able putting en­ergy el­se­where. Your en­ergy for more learn­ing is very use­ful for pe­ri­od­i­cally bounc­ing my­self into more learn­ing. • We can think about how con­sumers re­spond to changes in price by con­sid­er­ing the elas­tic­ity of the quan­tity de­manded at a given price—how quickly does de­mand de­crease as we raise prices? Price elas­tic­ity of de­mand is defined as ; in other words, for price and quan­tity , this is (this looks kinda weird, and it wasn’t im­me­di­ately ob­vi­ous what’s hap­pen­ing here...). Rev­enue is the to­tal amount of cash chang­ing hands: . What’s hap­pen­ing here is that rais­ing prices is a good idea when the rev­enue gained (the “price effect”) out­weighs the rev­enue lost to fal­ling de­mand (the “quan­tity effect”). A lot of words so far for an easy con­cept: If price elas­tic­ity is greater than 1, de­mand is in­elas­tic and price hikes de­crease rev­enue (and you should prob­a­bly have a sale). How­ever, if it’s less than 1, de­mand is elas­tic and boost­ing the price in­creases rev­enue—de­mand isn’t drop­ping off quickly enough to drag down the rev­enue. You can just look at the area of the rev­enue rec­t­an­gle for each effect! • How does rep­re­sen­ta­tion in­ter­act with con­scious­ness? Sup­pose you’re rea­son­ing about the uni­verse via a par­tially ob­serv­able Markov de­ci­sion pro­cess, and that your model is in­cred­ibly de­tailed and ac­cu­rate. Fur­ther sup­pose you rep­re­sent states as num­bers, as their nu­meric la­bels. To get a han­dle on what I mean, con­sider the game of Pac-Man, which can be rep­re­sented as a finite, de­ter­minis­tic, fully-ob­serv­able MDP. Think about all pos­si­ble game screens you can ob­serve, and num­ber them. Now get rid of the game screens. From the per­spec­tive of re­in­force­ment learn­ing, you haven’t lost any­thing—all poli­cies yield the same re­turn they did be­fore, the tran­si­tions/​rules of the game haven’t changed—in fact, there’s a pretty strong iso­mor­phism I can show be­tween these two MDPs. All you’ve done is changed the la­bels—rep­re­sen­ta­tion means prac­ti­cally noth­ing to the math­e­mat­i­cal ob­ject of the MDP, al­though many eg DRL al­gorithms should be able to ex­ploit reg­u­lar­i­ties in the rep­re­sen­ta­tion to re­duce sam­ple com­plex­ity. So what does this mean? If you model the world as a par­tially ob­serv­able MDP whose states are sin­gle num­bers… can you still com­mit mind­crime via your de­liber­a­tions? Is the struc­ture of the POMDP in your head some­how suffi­cient for con­scious­ness to be ac­counted for (like how the the­o­rems of com­plex­ity the­ory gov­ern com­put­ers both of flesh and of sili­con)? I’m con­fused. • I think a rea­son­able and re­lated ques­tion we don’t have a solid an­swer for is if hu­mans are already ca­pa­ble of mind crime. For ex­am­ple, maybe Alice is mad at Bob and imag­ines caus­ing harm to Bob. How well does Alice have to model Bob for her imag­in­ings to be mind crime? If Alice has low cog­ni­tive em­pa­thy is it not mind crime but if her cog­ni­tive em­pa­thy is above some level is it then mind crime? I think we’re cur­rently con­fused enough about what mind crime is such that it’s hard to even be­gin to know how we could an­swer these ques­tions based on more than gut feel­ings. • I sus­pect that it doesn’t mat­ter how ac­cu­rate or straight­for­ward a pre­dic­tor is in mod­el­ing peo­ple. What would make pre­dic­tion morally ir­rele­vant is that it’s not no­ticed by the pre­dicted peo­ple, ir­re­spec­tive of whether this hap­pens be­cause it spreads the moral weight con­ferred to them over many pos­si­bil­ities (giv­ing in­ac­cu­rate pre­dic­tion), keeps the rep­re­sen­ta­tion suffi­ciently baroque, or for some other rea­son. In the case of in­ac­cu­rate pre­dic­tion or baroque rep­re­sen­ta­tion, it prob­a­bly does be­come harder for the pre­dicted peo­ple to no­tice be­ing pre­dicted, and I think this is the ac­tual source of moral ir­rele­vance, not those things on their own. A more di­rect way of get­ting the same re­sult is to pre­dict coun­ter­fac­tu­als where the peo­ple you rea­son about don’t no­tice the fact that you are ob­serv­ing them, which also gives a form of in­ac­cu­racy (imag­ine that your pre­dict­ing them is part of their prior, that’ll drive the coun­ter­fac­tual fur­ther from re­al­ity). • I seem to differ­ently dis­count differ­ent parts of what I want. For ex­am­ple, I’m some­what will­ing to post­pone fun to low-prob­a­bil­ity high-fun fu­tures, whereas I’m not will­ing to do the same with ro­mance. • I had an in­tu­ition that at­tain­able util­ity preser­va­tion (RL but you main­tain your abil­ity to achieve other goals) points at a broader tem­plate for reg­u­lariza­tion. AUP reg­u­larizes the agent’s op­ti­mal policy to be more palat­able to­wards a bunch of differ­ent goals we may wish we had speci­fied. I hinted at the end of Towards a New Im­pact Mea­sure that the thing-be­hind-AUP might pro­duce in­ter­est­ing ML reg­u­lariza­tion tech­niques. This hunch was roughly cor­rect; Model-Ag­nos­tic Meta-Learn­ing tunes the net­work pa­ram­e­ters such that they can be quickly adapted to achieve low loss on other tasks (the prob­lem of few-shot learn­ing). The pa­ram­e­ters are not overfit on the scant few data points to which the pa­ram­e­ters are adapted, which is also in­ter­est­ing. • The fram­ing effect & aver­sion to losses gen­er­ally cause us to ex­e­cute more cau­tious plans. I’m re­al­iz­ing this is an­other rea­son to re­frame my x-risk mo­ti­va­tion from “I won’t let the world be de­stroyed” to “there’s so much fun we could have, and I want to make sure that hap­pens”. I think we need more ex­plo­ra­tory think­ing in al­ign­ment re­search right now. (Also, the former mo­ti­va­tion style led to me crash­ing and burn­ing a bit when my hands were in­jured and I was no longer able to do much.) ETA: ac­tu­ally, i’m re­al­iz­ing I had the effect back­wards. Fram­ing via losses ac­tu­ally en­courages more risk-tak­ing plans. Oops. I’d like to think about this more, since I no­tice my model didn’t protest when I ar­gued the op­po­site of the ex­per­i­men­tal con­clu­sions. • I’m re­al­iz­ing how much more risk-neu­tral I should be: Paul Sa­muel­son… offered a col­league a coin-toss gam­ble. If the col­league won the coin toss, he would re­ceive$200, but if he lost, he would lose \$100. Sa­muel­son was offer­ing his col­league a pos­i­tive ex­pected value with risk. The col­league, be­ing risk-averse, re­fused the sin­gle bet, but said that he would be happy to toss the coin 100 times! The col­league un­der­stood that the bet had a pos­i­tive ex­pected value and that across lots of bets, the odds vir­tu­ally guaran­teed a profit. Yet with only one trial, he had a 50% chance of re­gret­ting tak­ing the bet.

Notably, Sa­muel­son‘s col­league doubtless faced many gam­bles in life… He would have fared bet­ter in the long run by max­i­miz­ing his ex­pected value on each de­ci­sion… all of us en­counter such “small gam­bles” in life, and we should try to fol­low the same strat­egy. Risk aver­sion is likely to tempt us to turn down each in­di­vi­d­ual op­por­tu­nity for gain. Yet the ag­gre­gated risk of all of the pos­i­tive ex­pected value gam­bles that we come across would even­tu­ally be­come in­finites­i­mal, and po­ten­tial profit quite large.

• For what it’s worth, I tried some­thing like the “I won’t let the world be de­stroyed”->”I want to make sure the world keeps do­ing awe­some stuff” re­fram­ing back in the day and it broadly didn’t work. This had less to do with cau­tious/​un­cau­tious be­hav­ior and more to do with sta­tus quo bias. Say­ing “I won’t let the world be de­stroyed” treats “the world be­ing de­stroyed” as an event that de­vi­ates from the sta­tus quo of the world ex­ist­ing. In con­trast, say­ing “There’s so much fun we could have” treats “hav­ing more fun” as the event that de­vi­ates from the sta­tus quo of us not con­tin­u­ing to have fun.

When I saw the world be­ing de­stroyed as sta­tus quo, I cared a lot less about the world get­ting de­stroyed.

• I was hav­ing a bit of trou­ble hold­ing the point of quadratic resi­dues in my mind. I could effort­fully re­cite the defi­ni­tion, give an ex­am­ple, and walk through the broad-strokes steps of prov­ing quadratic re­ciproc­ity. But it felt fake and stale and mem­o­rized.

Alex Men­nen sug­gested a great way of think­ing about it. For some odd prime , con­sider the mul­ti­plica­tive group . This group is abelian and has even or­der . Now, con­sider a prim­i­tive root /​ gen­er­a­tor . By defi­ni­tion, ev­ery el­e­ment of the group can be ex­pressed as . The quadratic resi­dues are those ex­press­ible by even (this is why, for prime num­bers, half of the group is square mod ). This also lets us eas­ily see that the resi­d­ual sub­group is closed un­der mul­ti­pli­ca­tion by (which gen­er­ates it), that two non-resi­dues mul­ti­ply to make a resi­due, and that a resi­due and non-resi­due make a non-resi­due. The Le­gen­dre sym­bol then just tells us, for , whether is even.

Now, con­sider com­pos­ite num­bers whose prime de­com­po­si­tion only con­tains or in the ex­po­nents. By the fun­da­men­tal the­o­rem of finite abelian groups and the chi­nese re­main­der the­o­rem, we see that a num­ber is square mod iff it is square mod all of the prime fac­tors.

I’m still a lit­tle con­fused about how to think of squares mod .

• The the­o­rem: where is rel­a­tively prime to an odd prime and , is a square mod iff is a square mod and is even.

The real meat of the the­o­rem is the case (i.e. a square mod that isn’t a mul­ti­ple of is also a square mod . Deriv­ing the gen­eral case from there should be fairly straight­for­ward, so let’s fo­cus on this spe­cial case.

Why is it true? This ques­tion has a sur­pris­ing an­swer: New­ton’s method for find­ing roots of func­tions. Speci­fi­cally, we want to find a root of , ex­cept in in­stead of .

To adapt New­ton’s method to work in this situ­a­tion, we’ll need the p-adic ab­solute value on : for rel­a­tively prime to . This has lots of prop­er­ties that you should ex­pect of an “ab­solute value”: it’s pos­i­tive ( with only when ), mul­ti­plica­tive (), sym­met­ric (), and satis­fies a tri­an­gle in­equal­ity (; in fact, we get more in this case: ). Be­cause of pos­i­tivity, sym­me­try, and the tri­an­gle in­equal­ity, the p-adic ab­solute value in­duces a met­ric (in fact, ul­tra­met­ric, be­cause of the strong ver­sion of the tri­an­gle in­equal­ity) . To vi­su­al­ize this dis­tance func­tion, draw gi­ant cir­cles, and sort in­te­gers into cir­cles based on their value mod . Then draw smaller cir­cles in­side each of those gi­ant cir­cles, and sort the in­te­gers in the big cir­cle into the smaller cir­cles based on their value mod . Then draw even smaller cir­cles in­side each of those, and sort based on value mod , and so on. The dis­tance be­tween two num­bers cor­re­sponds to the size of the small­est cir­cle en­com­pass­ing both of them. Note that, in this met­ric, con­verges to .

Now on to New­ton’s method: if is a square mod , let be one of its square roots mod . ; that is, is some­what close to be­ing a root of with re­spect to the p-adic ab­solute value. , so ; that is, is steep near . This is good, be­cause start­ing close to a root and the slope of the func­tion be­ing steep enough are things that helps New­ton’s method con­verge; in gen­eral, it might bounce around chaot­i­cally in­stead. Speci­fi­cally, It turns out that, in this case, is ex­actly the right sense of be­ing close enough to a root with steep enough slope for New­ton’s method to work.

Now, New­ton’s method says that, from , you should go to . is in­vert­ible mod , so we can do this. Now here’s the kicker: , so . That is, is closer to be­ing a root of than is. Now we can just iter­ate this pro­cess un­til we reach with , and we’ve found our square root of mod .

Ex­er­cise: Do the same thing with cube roots. Then with roots of ar­bi­trary polyno­mi­als.

• The part about deriva­tives might have seemed a lit­tle odd. After all, you might think, is a dis­crete set, so what does it mean to take deriva­tives of func­tions on it. One an­swer to this is to just differ­en­ti­ate sym­bol­i­cally us­ing polyno­mial differ­en­ti­a­tion rules. But I think a bet­ter an­swer is to re­mem­ber that we’re us­ing a differ­ent met­ric than usual, and isn’t dis­crete at all! In­deed, for any num­ber , , so no points are iso­lated, and we can define differ­en­ti­a­tion of func­tions on in ex­actly the usual way with limits.

• I no­ticed I was con­fused and li­able to for­get my grasp on what the hell is so “nor­mal” about nor­mal sub­groups. You know what that means—col­or­ful pic­ture time!

First, the clas­sic defi­ni­tion. A sub­group is nor­mal when, for all group el­e­ments , (this is triv­ially true for all sub­groups of abelian groups).

ETA: I drew the bounds a bit in­cor­rectly; is most cer­tainly within the left coset ().

No­tice that non­triv­ial cosets aren’t sub­groups, be­cause they don’t have the iden­tity .

This “nor­mal” thing mat­ters be­cause some­times we want to high­light reg­u­lar­i­ties in the group by tak­ing a quo­tient. Tak­ing an ex­am­ple from the ex­cel­lent Vi­sual Group The­ory, the in­te­gers have a quo­tient group con­sist­ing of the con­gru­ence classes , each in­te­ger slot­ted into a class ac­cord­ing to its value mod 12. We’re tak­ing a quo­tient with the cyclic sub­group .

So, what can go wrong? Well, if the sub­group isn’t nor­mal, strange things can hap­pen when you try to take a quo­tient.

Here’s what’s hap­pen­ing:

Nor­mal­ity means that when you form the new Cayley di­a­gram, the ar­rows be­have prop­erly. You’re at the ori­gin, . You travel to us­ing . What we need for this di­a­gram to make sense is that if you fol­low any you please, ap­ply­ing means you go back to . In other words, . In other words, . In other other words (and us­ing a few prop­er­ties of groups), .

• One of the rea­sons I think cor­rigi­bil­ity might have a sim­ple core prin­ci­ple is: it seems pos­si­ble to imag­ine a kind of AI which would make a lot of differ­ent pos­si­ble de­sign­ers happy. That is, if you imag­ine the same AI de­sign de­ployed by coun­ter­fac­tu­ally differ­ent agents with differ­ent val­ues and some­what-rea­son­able ra­tio­nal­ities, it ends up do­ing a good job by al­most all of them. It ends up act­ing to fur­ther the de­sign­ers’ in­ter­ests in each coun­ter­fac­tual. This has been a use­ful in­for­mal way for me to think about cor­rigi­bil­ity, when con­sid­er­ing differ­ent pro­pos­als.

This in­var­i­ance also shows up (in a differ­ent way) in AUP, where the agent main­tains its abil­ity to satisfy many differ­ent goals. In the con­text of long-term safety, AUP agents are de­signed to avoid gain­ing power, which im­plic­itly ends up re­spect­ing the con­trol of other agents pre­sent in the en­vi­ron­ment (no mat­ter their goals).

I’m in­ter­ested in think­ing more about this in­var­i­ance, and why it seems to show up in a sen­si­ble way in two differ­ent places.

• (Just start­ing to learn microe­con, so please feel free to chirp cor­rec­tions)

How diminish­ing marginal util­ity helps cre­ate sup­ply/​de­mand curves: think about the uses you could find for a pillow. Your first few pillows are used to help you fall asleep. After that, maybe some for your couch, and then a few spares to keep in stor­age. You pri­ori­tize pillow al­lo­ca­tion in this man­ner; the value of the lat­ter uses is much less than the value of hav­ing a place to rest your head.

How many pillows do you buy at a given price point? Well, if you buy any, you’ll buy some for your bed at least. Then, when pillows get cheap enough, you’ll start buy­ing them for your couch. At what price, ex­actly? Depends on the per­son, and their util­ity func­tion. So as the price goes up or down, it does or doesn’t be­come worth it to buy pillows for differ­ent lev­els of the “use hi­er­ar­chy”.

Then part of what the sup­ply/​de­mand curve is re­flect­ing is the dis­tri­bu­tion of pillow use val­u­a­tions in the mar­ket. It tracks when differ­ent uses be­come worth it for differ­ent agents, and how sig­nifi­cant these shifts are!