# A Request for Open Problems

Open prob­lems are clearly defined prob­lems1 that have not been solved. In older fields, such as Math­e­mat­ics, the list is rather in­timi­dat­ing. Ra­tion­al­ity, on the other, seems to have no list.

While we have all of us here to­gether to crunch on prob­lems, let’s shoot higher than try­ing to think of solu­tions and then find­ing prob­lems that match the solu­tion. What things are un­solved ques­tions? Is it rea­son­able to as­sume those ques­tions have con­crete, ab­solute an­swers?

The catch is that these prob­lems can­not be in­her­ently fuzzy prob­lems. “How do I be­come less wrong?” is not a prob­lem that can be clearly defined. As such, it does not have a con­crete, ab­solute an­swer. Does Ra­tion­al­ity have a set of prob­lems that can be clearly defined? If not, how do we work to­ward get­ting our prob­lems clearly defined?

1: “Clearly defined” es­sen­tially means a for­mal, un­am­bigu­ous defi­ni­tion. “Solv­ing” such a prob­lem would con­sti­tute a for­mal proof.

• Sup­pose some­one offered you a bet on P = NP. How should you go about de­cid­ing whether or not to take it?

Does it make sense to as­sign prob­a­bil­ities to un­proved math­e­mat­i­cal con­jec­tures? If so, what is the prob­a­bil­ity of P = NP? How should we com­pute this prob­a­bil­ity, given what we know so far about the prob­lem? If the an­swer is no, what is a ra­tio­nal way to deal with math­e­mat­i­cal un­cer­tainty?

• Scott Aaron­son and many oth­ers in his field have es­sen­tially bet their ca­reers that P ≠ NP (at least, I think the whole hi­er­ar­chy of com­plex­ity classes would col­lapse if P = NP), while only a bunch of cranks (so far as I’ve heard) have bet a sub­stan­tial amount of their life’s efforts on P = NP. That’s as good a state­ment of bet­ting odds as any.

As for whether it’s le­gi­t­i­mate to do so, well, that’s Bayesian prob­a­bil­ity for you. You’ve got to have some way of deal­ing with sub­jec­tive un­cer­tainty, and you get bet­ter re­sults in the long run if your method of deal­ing with it is like unto the laws of prob­a­bil­ity.

• Let’s say you’re the first per­son to work on P = NP. What then? (As­sume that you have enough math­e­mat­i­cal abil­ity to pro­duce most of the re­sults on it that we have so far.)

• Well, I’m pretty sure that the smart money is all on one side of the ques­tion be­cause of cer­tain heuris­tic ev­i­dence that be­comes very clear when you’ve worked with these com­plex­ity classes for a long while.

It’s the same rea­son the smart money was on Fer­mat’s Last The­o­rem be­ing true (prior to the proof be­ing found); not only would it have been very un­usual in math­e­mat­ics for this sim­ple Dio­phan­tine equa­tion to have its first non­triv­ial solu­tion only when the num­bers be­came ab­surdly large, but it is equiv­a­lent to a beau­tiful con­jec­ture in el­lip­tic curves which seemed to ad­mit of no coun­terex­am­ples.

There’s plenty of in­duc­tive ev­i­dence found and used in the ac­tual do­ing of math­e­mat­ics; it just doesn’t make the text­books (since you gen­er­ally don’t pub­lish on a sub­ject un­less you’ve found a proof, at which point the in­duc­tive ev­i­dence is ut­terly re­dun­dant). Yet it guides the in­tu­ition of all math­e­mat­i­ci­ans when they de­cide what to try prov­ing next.

Com­pare Ein­stein’s Ar­ro­gance.

• Sup­pose you test Fer­mat’s Last The­o­rem for n up to 10^10, and don’t find a coun­terex­am­ple. How much ev­i­dence does that give you for FLT be­ing true? In other words, how do you com­pute P(a coun­terex­am­ple ex­ists with n<=10^10 | FLT is false), since that’s what’s needed to do a Bayesian up­date with this in­duc­tive ev­i­dence? (As­sume this is be­fore the proof was found.)

I don’t dis­pute that math­e­mat­i­ci­ans do seem to rea­son in ways that are similar to us­ing prob­a­bil­ities, but I’d like to know where these “prob­a­bil­ities” are com­ing from and whether the rea­son­ing pro­cess re­ally is iso­mor­phic to prob­a­bil­ity the­ory. What you call “heuris­tic” and “in­tu­ition” are the re­sults of com­pu­ta­tions be­ing done by the brains of math­e­mat­i­ci­ans, and it would be nice to know what the al­gorithms are (or should be), but we don’t have them even in an ideal­ized form.

• The most in­fluen­tial books on the topic I know of are Math­e­mat­ics and Plau­si­ble Rea­son­ing Vol­ume I: In­duc­tion and Anal­ogy in Math­e­mat­ics, and Math­e­mat­ics and Plau­si­ble Rea­son­ing Vol­ume II: Pat­terns of Plau­si­ble Rea­son­ing (ama­zon link) by Ge­orge Pólya.

• There’s a difficulty here in­volv­ing the fact that ev­ery finite set of pos­si­ble coun­terex­am­ples has mea­sure zero in the set {(a, b, c, n) | a, b, c, n in N} equipped with the prob­a­bil­ity mea­sure that as­signs each pos­si­ble coun­terex­am­ple an equal prob­a­bil­ity.

So all the usual bi­ases and cog­ni­tive gotchas in­volv­ing prob­a­bil­ity and in­finity come into play (even when the peo­ple do­ing the think­ing are math­e­mat­i­ci­ans!), and all bets are off.

My hy­poth­e­sis is that the com­monly used prior for coun­terex­am­ple dis­tri­bu­tions is ex­po­nen­tial. As the lower bound K >= a, b, c, n on pos­si­ble coun­terex­am­ples in­creases, the ex­po­nen­tial is up­dated into some­thing close to uniform on the rest of {(a, b, c, n) | a, b, c, n in N}.

• This does some­what dodge the ques­tion, but it does make a differ­ence that an in­finite set of coun­terex­am­ples can be as­so­ci­ated with each coun­terex­am­ple. That is, if (a,b,c,n) is not a solu­tion to the Fer­mat equa­tion, then (ka,kb,kc,n) isn’t ei­ther for any pos­i­tive in­te­ger k.

• And, mid-2010, Scott Aaron­son also liter­ally put down a large bet that P != NP.

• If you’re think­ing of this, you’re mis­re­mem­ber­ing—that bet (of \$200,000) was that Vi­nay De­o­lal­ikar’s re­cent P != NP pa­per would not win the Clay Mille­nium prize. (In the com­ments, Aaron­son says “P≠NP is ex­actly the ‘ex­pected’ an­swer!”; the bet was his way of keep­ing peo­ple from ac­cus­ing him of rush­ing to judg­ment when he said that he didn’t think the pa­per would turn out to suc­cess­fully prove that ex­pected an­swer even though he hadn’t read the whole thing.)

• Ah, you’re en­tirely right. I didn’t mis­re­mem­ber—I read his blog rather re­li­giously. I just ap­par­ently wasn’t quite awake when I wrote what he was bet­ting on.

I should also clar­ify that he didn’t have any­one match­ing even a lesser amount in the case that the pa­per was in­deed un­suc­cess­ful (which it ap­pears to be as it stands, but Aaron­son’s bet gives it a while to cor­rect prob­lems). His goal, which didn’t ex­actly work, was to get peo­ple to stop ask­ing him about the pa­per. I say it didn’t work, be­cause he prob­a­bly got even more peo­ple com­ment­ing on the bet, and still a large num­ber ask­ing about the pa­per.

• Here are some pa­pers I found re­cently that seem to rep­re­sent the state of the art on the is­sue of how to deal with un­cer­tainty in math­e­mat­ics. None of them re­ally get very far be­yond “let’s try to ap­ply prob­a­bil­ity the­ory”, but at least the prob­lem is not be­ing com­pletely ig­nored by math­e­mat­i­ci­ans and philoso­phers.

• One could say that most of math is already about un­cer­tainty: when you have a sys­tem and ways of re­fin­ing it, it is in a way a form of ap­ply­ing knowl­edge to re­solve un­cer­tainty. For ex­am­ple, ap­ply­ing a func­tion to a pa­ram­e­ter, or com­bin­ing mor­phisms. A lot of anal­y­sis is about ap­prox­i­ma­tion or rep­re­sent­ing sys­tems that ex­pect fu­ture ob­ser­va­tions. It is a very nar­row sense of “deal­ing with un­cer­tainty” that would re­quire go­ing to the fringe.

• I don’t un­der­stand the point of your com­ment. It should have been clear from the con­text that by “deal­ing with un­cer­tainty in math­e­mat­ics” I did not mean things like prov­ing or dis­prov­ing a con­jec­ture, thus re­solv­ing its un­cer­tainty, but rather how to make bets in­volv­ing math­e­mat­i­cal state­ments that we don’t know how to ei­ther prove or dis­prove. Are you say­ing that the lat­ter is not an im­por­tant prob­lem, or just that you don’t like that I’m us­ing the phrase “deal­ing with un­cer­tainty in math­e­mat­ics” to re­fer to it?

• You don’t have to re­solve all of un­cer­tainty in one go. For ex­am­ple, you could re­strict a func­tion to part of a do­main, thus de­cid­ing that it is only this part that you are in­ter­ested in, in­stead of the whole thing.

What you seem to mean is non-rigor­ous meth­ods for mak­ing un­cer­tain con­clu­sions about math­e­mat­i­cal struc­tures. It is about deal­ing with un­cer­tainty about math­e­mat­ics on non-math­e­mat­i­cal level of rigor. Cor­rect?

• Yes, some­thing like that, ex­cept that “non-rigor­ous” seems too prej­u­di­cial. Why not just “meth­ods for mak­ing un­cer­tain con­clu­sions about math­e­mat­i­cal struc­tures”, or “deal­ing with un­cer­tainty about math­e­mat­ics”?

• (Re­tracted; use­less spec­u­la­tion.)

• Is there a best lan­guage in which to ex­press com­plex­ity for use in the con­text of Oc­cam’s ra­zor.

If there is a best lan­guage in which to ex­press com­plex­ity for use in the con­text of Oc­cam’s ra­zor, what is that lan­guage?

• I sus­pect the an­swer is no, at least not the kind of for­mal lan­guages that have been sug­gested so far. The prob­lem is this: as soon as you define a for­mal lan­guage, I can say “the lex­i­co­graph­i­cally first ob­ject which can’t be de­scribed in less than a mil­lion bits in your lan­guage”. Given the unique­ness of this ob­ject, why should it be a pri­ori as un­likely as a ran­dom mil­lion-bit string?

• Note that in gen­eral, this sort of speci­fi­ca­tion is un­com­putable. To find the lex­i­co­graph­i­cally first ob­ject which can’t be de­scribed in less than a mil­lion bits, we would have to make a list of all ob­jects which can be de­scribed in less than a mil­lion bits (and re­turn the first thing that’s not on the list). But we don’t know if our UTM will halt for any given string with less than a mil­lion bits, so we can never know if our list is com­plete.

• Yes, and that’s sort of in­ten­tional. I was try­ing to come up with a math­e­mat­i­cal model of an agent that can deal with un­com­putable physics. The physics of our uni­verse seems likely to be com­putable, but there is no a pri­ori rea­son to as­sume that it must be. We may even­tu­ally dis­cover a law of physics that’s not com­putable, or find out that we are in a simu­la­tion run­ning in­side a larger uni­verse that has un­com­putable physics. Agents us­ing UTM-based pri­ors can’t deal with these sce­nar­ios.

So I tried to find a “bet­ter”, i.e., more ex­pres­sive, lan­guage for de­scribing ob­jects, but then re­al­ized that any fixed for­mal lan­guage has a similar prob­lem. Here’s my cur­rent idea for solv­ing this: make the lan­guage ex­ten­si­ble in­stead of fixed. That is, define a base lan­guage, and a pro­ce­dure for ex­tend­ing the lan­guage. Then, when the agent en­coun­ters some ob­ject that can’t be de­scribed con­cisely us­ing his cur­rent lan­guage, he re­cur­sively ex­tends it un­til a short de­scrip­tion is pos­si­ble. What the ex­ten­sion pro­ce­dure should be is still un­clear.

• I agree that there are very in­ter­est­ing ques­tions here. We have quite nat­u­ral ways of de­scribing un­com­putable func­tions very far up the ar­ith­meti­cal hi­er­ar­chy, and it seems that they can be de­scribed in some kind of re­cur­sive lan­guage even if the things they de­scribe are not re­cur­sive (us­ing re­cur­sive in the re­cur­sion the­ory sense both times). Tur­ing tried some­thing like this in Sys­tems of Logic Based on Or­di­nals (Tur­ing, 1939), but that was with for­mal logic and sys­tems where you re­peat­edly add the Godel sen­tence of a sys­tem into the sys­tem as an ax­iom, re­peat­ing this into the trans­finite. A similar thing could be done de­scribing lev­els of com­putabil­ity trans­finitely far up the ar­ith­meti­cal hi­er­ar­chy us­ing re­cur­sively rep­re­sented or­di­nals to in­dex them. How­ever, then peo­ple like you and I will want to use cer­tain well defined but non-re­cur­sive or­di­nals to do the in­dex­ing, and it seems to de­gen­er­ate in the stan­dard kind of way, just a lot fur­ther up the hi­er­ar­chy than be­fore.

• And then there are ob­jects that are com­pletely out­side the ar­ith­meti­cal hi­er­ar­chy, but we prob­a­bly shouldn’t as­sign zero pri­ors to ei­ther. Things like large car­di­nals, per­haps.

Another mys­tery is, why did evolu­tion cre­ate minds ca­pa­ble of think­ing about these is­sues, given that agents equipped with a fixed UTM-based prior would have done perfectly fine in our place, at least up to now?

• We aren’t re­ally Bayesian rea­son­ing ma­chines at all, and it isn’t re­ally ac­cu­rate to speak of us hav­ing a prior. We choose a prior in or­der to use Bayesian rea­son­ing to an­a­lyze a situ­a­tion, and we seek to bend our nat­u­ral rea­son­ing to a Bayesian tem­plate in or­der to im­prove its ac­cu­racy, but we can­not wholly suc­ceed in do­ing so. So the prob­lem you raise should worry some­one build­ing AGI, but it’s not re­al­is­tic to imag­ine a hu­man agent be­com­ing so Bayesian that they swal­low the Solomonoff prior whole and are liter­ally un­able to con­tem­plate su­per-Tur­ing Uni­verses.

I don’t think it’s un­rea­son­able, there­fore, to adopt the Solomonoff prior as a use­ful model to aid rea­son­ing and dis­cus­sion, and rely on our hu­man abil­ity to make and adopt a new, su­per-Tur­ing model if some more gen­eral prior would have favoured it.

• We may even­tu­ally dis­cover a law of physics that’s not com­putable, or find out that we are in a simu­la­tion run­ning in­side a larger uni­verse that has un­com­putable physics. Agents us­ing UTM-based pri­ors can’t deal with these sce­nar­ios.

Then how can you deal with these sce­nar­ios? Did the idiot God make you bet­ter equipped for this task, Oh un­com­putable ape-brain?

• The idea of agents us­ing UTM-based pri­ors is a hu­man in­ven­tion, and there­fore sub­ject to hu­man er­ror. I’m not claiming to have an un­com­putable brain, just that I’ve found such an er­ror.

For a spe­cific ex­am­ple of how hu­man be­ings might deal with such sce­nar­ios, com­pared to agents us­ing UTM-based pri­ors, see “is in­duc­tion un­for­mal­iz­able?”.

• The model of en­vi­ron­ment val­ues ob­ser­va­tions and be­hav­iors, not state­ments about “un­com­putabil­ity” and such. No ob­ser­va­tion should be left out, de­clared im­pos­si­ble. If you, as a hu­man, de­cide to trust in some­thing you la­bel “halt­ing or­a­cle”, that’s your de­ci­sion, and this is a de­ci­sion you’d want any trusted AI to carry through as well.

I sus­pect that the roots of this con­fu­sion are some­thing not un­like mind pro­jec­tion fal­lacy, with mag­i­cal prop­er­ties at­tributed to mod­els, but I’m not com­pe­tent to dis­cuss do­main-spe­cific as­pects of this ques­tion.

• Not a se­ri­ous prob­lem—just con­sider ob­jects which can’t be de­scribed in less than a mil­lion bits af­ter a mil­lion timesteps in­stead.

• If a prob­lem is so se­ri­ous that you have to change your prior, then I’d call that a se­ri­ous prob­lem.

• Did you un­der­stand why the prob­lem is not se­ri­ous? Your com­ment was nit-pick­ing the ex­am­ple. Other similar ex­am­ples make ex­actly the same point—but are not vuln­er­a­ble to such nit-pick­ing.

• It doesn’t sound too se­ri­ous—if all the can­di­date lan­guages have the same prob­lem.

It is hard to deny that some lan­guages are bet­ter than other ones, and so there­fore there is a set of “best” ones. The in­tent of my first ques­tion was more along the lines of: is one for­mu­la­tion of Oc­cam’s ra­zor op­ti­mal for all the differ­ent situ­a­tions in which Oc­cam’s ra­zor is used—or should we be us­ing differ­ent de­scrip­tive lan­guages un­der differ­ent cir­cum­stances.

• That de­pends on what you mean by “best”.

Is speed of calcu­la­tion im­por­tant? What about suit­abil­ity for hu­mans? I guess you want one where com­plex­ities are as small as pos­si­ble.

Given 2 lan­guages, L1 & L2, and their com­plex­ity mea­sures, K1 & K2.

If K1(L2) < K2(L1) then I take that as a sign that L1 is bet­ter for use in the con­text of Ock­hams ra­zor. It is also a sign that L1 is more com­plex than L2, but that effect can be re­moved by do­ing lots of com­par­i­sons like that, so the un­nec­es­sar­ily com­plex lan­guages loose against those that are ac­tu­ally good. Or one can use 3 lan­guages in a com­par­i­son: K1(L3) < K2(L3) is a sign that L1 is bet­ter.

The idea here is that a good lan­guage should more eas­ily rep­re­sent sys­tems, such as other lan­guages.

What sort of lan­guages are best in this mea­sure? Not the sim­plest ones, but rather the more ex­pres­sive ones. Pro­grams for Tur­ing ma­chines use some com­plex­ity for tape travers­ing and stor­age sys­tems, so I guess they are not so good. Cel­lu­lar au­tomata have the same prob­lems. Com­bi­na­tory Logic is very sim­ple, with com­plex op­er­a­tors even for sim­ple stuff, but its sys­tem for stor­age and ac­cess can be sim­pler, since it is a tree. I guess Lambda Calcu­lus is one of the bet­ter lan­guages, as it is both sim­ple and very ex­pres­sive, be­cause of its more di­rect tree struc­ture.

I guess the best lan­guage would use an even more ex­pres­sive struc­ture, such as graphs, per­haps bi-graphs, since those are max­i­mally ex­pres­sive. The Scheme lan­guage can be used as a gen­eral graph lan­guage thanks to the set-car! and set-cdr! pro­ce­dures.

• “Best”: most ac­cu­rate—i.e. when Oc­cam’s ra­zor says A is a more likely hy­poth­e­sis than B, then that is ac­tu­ally true.

• Then I would go for Tur­ing ma­chines, Lambda calcu­lus, or similar. Th­ese lan­guages are very sim­ple, and can eas­ily han­dle in­put and out­put.

Even sim­pler lan­guages, like cel­lu­lar au­toma­ton No.110 or Com­bi­na­tory Logic might be bet­ter, but those are quite difficult to get to han­dle in­put and out­put cor­rectly.

The rea­son sim­ple lan­guages, or uni­ver­sal ma­chines, should be bet­ter, is that the up­per bound for er­ror in es­ti­mat­ing prob­a­bil­ities is 2 to the power of the com­plex­ity of the pro­gram simu­lat­ing one lan­guage in an­other, ac­cord­ing to al­gorith­mic in­for­ma­tion the­ory.

So, the sim­pler the lan­guage is, the more cor­rect rel­a­tive prob­a­bil­ities it will give.

The an­swer I gave be­fore this one, was for the ques­tion: Max­i­mize the like­li­hood that the lan­guage will pro­duce the out­put cor­re­spond­ing to the ob­served events.

You how­ever want to com­pare 2 hy­pothe­ses, get­ting 2 prob­a­bil­ities, and com­pare them. In this case the ab­solute size of the prob­a­bil­ities do not mat­ter, just their rel­a­tive size, so you can just go for the sim­plest lan­guage that is prac­ti­cal to use.

What do you want to use this for?

• Us­ing sim­ple lan­guages is the con­ven­tional ap­proach. How­ever, sim­ple lan­guages typ­i­cally re­sult in more com­plex pro­grams. The game of life is very sim­ple—yet try writ­ing a pro­gram in it. If you are try­ing to min­imise the size of an em­u­la­tor of other lan­guages, then highly sim­ple lan­guages don’t seem to fit the bill.

Why would one want a de­cent for­mu­la­tion of Oc­cam’s ra­zor? To help solve the prob­lem of the pri­ors.

• I agree. We seem to have the same goal, so my first ad­vice stands, not my sec­ond.

I am cur­rently try­ing to de­velop a lan­guage that is both sim­ple and ex­pres­sive, and mak­ing some progress. The over­all de­sign is finished, and I am now down to what in­struc­tions it should have. It is a gen­eral bi-graph, but with a se­quen­tial pro­gram struc­ture, and no sep­a­ra­tion of pro­gram and data.

It is some­what differ­ent from what you want, as I also need some­thing that have mea­surable use of time and mem­ory, and is prov­able able to run fast.

• I’m sure that’s not pos­si­ble to do in gen­eral (in an al­gorith­mic fash­ion), since it would solve cer­tain math­e­mat­i­cal prob­lems in a flash that aren’t sup­posed to be solved in polyno­mial time or even effi­ciently (like par­tic­u­lar halt­ing prob­lems).

You’ll have to hope for some­thing less.

• I think Tim’s in­tended mean­ing was more along the lines of “a lan­guage L such that when A is a more likely hy­poth­e­sis than B, L’s MDL for A is shorter than its MDL for B more of­ten than for any other lan­guage”.

• What does it mean to deal ra­tio­nally with moral un­cer­tainty? If Nick Bostrom and Toby Ord’s solu­tion is right, how do we ap­ply it in prac­tice?

ETA: this isn’t a “clearly defined” ques­tion in the sense you men­tioned, but I’ll leave it up any­way; apologies

• As I pointed out in that thread, their solu­tion doesn’t work. You would need to choose an ag­gre­ga­tion mechanism to com­bine votes. Differ­ent mechanisms will cause differ­ent sys­tem­atic out­comes. Notably, some mechanisms will re­sult in always choos­ing ac­tions from one cat­e­gory; some mechanisms will re­sult in sam­pling from differ­ent cat­e­gories pro­por­tion­ally to their votes (much as, eg., the Amer­i­can sys­tem always chooses the most pop­u­lar can­di­date, re­sult­ing in a 2-party sys­tem equil­ibrium; many Euro­pean sys­tems al­lo­cate seats pro­por­tion­ally to votes, al­low­ing equil­ibria with more than 2 par­ties.)

You need to choose which kind of out­come you pre­fer in or­der to choose your ag­gre­ga­tion mechanism, in or­der to im­ple­ment their solu­tion. But if you could do that, you wouldn’t need their solu­tion in the first place!

• You need to choose which kind of out­come you pre­fer in or­der to choose your ag­gre­ga­tion mechanism

Is this re­ally the case? It’s doesn’t seem true of ax­io­matic ap­proaches to de­ci­sion-the­ory in gen­eral, so is there a spe­cial rea­son to think it should be true here?

But if you could do that, you wouldn’t need their solu­tion in the first place!

I guess I would view the par­li­a­men­tary mechanism more as an in­tu­ition pump than a “solu­tion” per se. It may well be that, hav­ing thought through it’s im­pli­ca­tions, it will turn out that the re­sults can be rep­re­sented in (say) the stan­dard vNM frame­work. Nonethe­less, the par­li­a­men­tary model could still be helpful in get­ting a han­dle on the na­ture of the “util­ity” func­tions in­volved.

As an aside, it seems as though their par­li­a­men­tary ap­proach could po­ten­tially be mod­eled more effec­tively us­ing co-op­er­a­tive game the­ory than the more stan­dard non-co­op­er­a­tive ver­sion.

• Is this re­ally the case? It’s doesn’t seem true of ax­io­matic ap­proaches to de­ci­sion-the­ory in gen­eral, so is there a spe­cial rea­son to think it should be true here?

I just gave the rea­son. “Some mechanisms will re­sult in always choos­ing ac­tions from one cat­e­gory; some mechanisms will re­sult in sam­pling from differ­ent cat­e­gories pro­por­tion­ally to their votes.”

The ag­gre­ga­tion mechanism is a lot like the thread pri­or­ity sys­tem in a com­puter op­er­at­ing sys­tem. Some op­er­at­ing sys­tems will always give the CPU to the high­est-pri­or­ity task. Some try to give tasks CPU time pro­por­tional to their pri­or­ity. Like­wise, some ag­gre­ga­tion mechanisms will choose the most pop­u­lar op­tion; some choose op­tions with prob­a­bil­ity pro­por­tional to their pop­u­lar­ity, never giv­ing any voice to minor­ity opinions. You have to choose which type of ag­gre­ga­tion mechanism to use. But this choice is ex­actly the sort of choice that the par­li­a­ment is sup­posed to be pro­duc­ing as out­put, not re­quiring as in­put.

• I won­der if it would work to renor­mal­ize util­ity so that the to­tal of ev­ery­thing that’s “at stake” (in some sense that would need to be made more pre­cise) is always worth the same?

Prob­a­bly this gives too much weight to easy-to-achieve moral­ities, like the moral­ity that says all that mat­ters is whether you’re happy to­mor­row? It also doesn’t ac­com­mo­date non-con­se­quen­tal­ist moral­ities.

But does it ever make sense to re­spond to new moral in­for­ma­tion by say­ing, “huh, I guess ex­is­tence as a whole doesn’t mat­ter as much as I thought it did”? It seems coun­ter­in­tu­itive some­how.

• I can’t fol­low your com­ment. I would need some in­fer­en­tial steps filled in, be­tween the prior com­ment, and the first sen­tence of your com­ment, and be­tween ev­ery sen­tence of your com­ment.

• It’s still a great ques­tion. I do not know how far we can get for­mal­iz­ing it, but my hunches tell we could get pretty close.

• 2. Is there a prac­ti­cal method to reach Au­mann agree­ment?

• Here are some more prob­lems that have come up on LW:

• What is my util­ity func­tion?

• Meta-philosophy

• How to think about “all math­e­mat­i­cal struc­tures” when no for­mal lan­guage is suffi­ciently ex­pres­sive?

• What is the na­ture of an­ti­ci­pa­tion/​sur­prise/​dis­ap­point­ment/​good and bad news? (For ex­am­ple, sur­prise wouldn’t be a cog­ni­tive fea­ture of an ideal util­ity max­i­mizer. What pur­pose, if any, does it serve in us?)

• Here’s an open prob­lem that’s been on my mind this past week:

Take some con­tro­ver­sial ques­tion on which there are a small num­ber of pop­u­lar opinions. Draw a line go­ing from 0 on the left to 1 on the right. Divide that line into seg­ments for each opinion that holds > 1% of opinion-space.

Now strat­ify the pop­u­la­tion by IQ into 10-point in­ter­vals. Redo the pro­cess, draw­ing a new line from 0 to 1 for each IQ range and di­vid­ing it into seg­ments. Then stack your 0-1 line seg­ments up ver­ti­cally. Con­nect the sec­tions for the same opinions in each IQ group.

What does the re­sult­ing pic­ture look like? Ques­tions in­clude:

• Is there a wider range of pop­u­lar opinions (tech­ni­cally, a larger en­tropy) at the bot­tom or at the top?

• Are there opinions that have max­i­mum pop­u­lar­ity at an in­ter­me­di­ate IQ level?

• Are there opinions that have min­i­mum pop­u­lar­ity at an in­ter­me­di­ate IQ level?

Other vari­ables could be used on the ver­ti­cal axis. In gen­eral, con­tin­u­ous vari­ables (IQ, ed­u­ca­tional level, year of sur­vey, in­come) are more in­ter­est­ing to me than cat­e­gory vari­ables (race, sex). I’m try­ing to get at the ques­tion of how sys­tem­atic mi­s­un­der­stand­ings are, how re­li­ably in­creas­ing un­der­stand­ing in­creases ac­cu­racy, and whether in­creas­ing un­der­stand­ing of a topic in­creases or de­creases the chances of agree­ment on it.

I no­ticed re­cently my im­pres­sion that cer­tain wrong opinions re­li­ably at­tract peo­ple of a cer­tain level of un­der­stand­ing of a prob­lem. In some do­mains, in­creas­ing some­one’s in­tel­li­gence or aware­ness seems to de­crease their chances of cor­rect ac­tion. This is prob­a­bly be­cause peo­ple have evolved cor­rect be­hav­iors, and figure out in­cor­rect be­hav­iors when they’re smart enough to no­tice what they’re do­ing but not smart enough to figure out why. But it’s then in­ter­est­ing that their er­rors would be cor­re­lated rather than ran­dom.

• If there were a com­mon pat­tern, you could sam­ple the opinions in a pop­u­la­tion, or over time, and es­ti­mate how much longer it would take, or how much knowl­edge would be needed, to ar­rive at a cor­rect opinion.

• Another prob­lem, al­though to what de­gree it is cur­rently both “open” and rele­vant is de­bat­able: find­ing new loop­holes in Ar­row’s The­o­rem.

• Can you even have a clearly defined prob­lem in any field other than Math­e­mat­ics, or one that doesn’t re­duce to a math­e­mat­i­cal prob­lem re­gard­less of the field where it origi­nated?

• Some peo­ple like math­e­mat­ics so much that they define it as en­com­pass­ing all clearly defined prob­lems. Some peo­ple like philos­o­phy so much that they define it as en­com­pass­ing all prob­lems what­so­ever. Both of these defi­ni­tions are clearly wrong, and the product of af­fec­tive death spirals.

• You say that as if a prob­lem can only be­long to one field.

• See Wikipe­dia’s list for a few ex­am­ples.

The dis­tinc­tion be­tween clearly defined and oth­er­wise is some­what sub­jec­tive. I have not heard any­one talk­ing about the sub­ject yet, so brought it up.

Since Ra­tion­al­ity seems to be strongly re­lated to Bayes’ the­o­rem, it makes some sense that a lot of prob­lems could be pre­sented in a fash­ion where we only have to an­swer a few ques­tions about pri­ors to un­der­stand which ac­tions to take.

I don’t know if this an­swers your ques­tion.

• It’s the best pos­si­ble kind of an­swer to my ques­tion—a link to a load of in­ter­est­ing stuff—thanks!

I see where I went wrong, in miss­ing out the en­tire phys­i­cal uni­verse as a source of ques­tions that can be clearly stated but are about real things rather than math­e­mat­i­cal de­scrip­tions of them.

• How do we han­dle the ex­is­tence of knowl­edge which is re­li­able but can­not be ex­plained? As an ex­am­ple, con­sider the hu­man abil­ity to rec­og­nize faces (or places, pieces of mu­sic, etc). We have nearly to­tal con­fi­dence in our abil­ity to rec­og­nize peo­ple by their faces (given enough time, good light­ing, etc). How­ever, we can­not ar­tic­u­late the pro­cess by which we perform face recog­ni­tion.

Imag­ine you met a blind alien, and for what­ever rea­son needed to con­vince it of your abil­ity to rec­og­nize peo­ple by face. Since you can­not provide a rea­son­able de­scrip­tion of your face recog­ni­tion pro­cess, you are es­sen­tially in the po­si­tion of say­ing “I’m to­tally sure of the iden­tity of the per­son I saw, but I can­not ex­plain why I am so cer­tain”.

• Quite a bit is known about the neu­rol­ogy be­hind face recog­ni­tion. No one un­der­stands the al­gorithm well enough to build a fusiform gyrus from scratch, but that doesn’t mean the fact that there is an al­gorithm is mys­te­ri­ous.

• Even if we did not have any un­der­stand­ing of the neu­rol­ogy, I’m not sure why point­ing to an em­piri­cal record of suc­cess­ful face recog­ni­tion shouldn’t be fairly con­vinc­ing. Is the point that we could be ly­ing about our record?

(In the spe­cific ex­am­ple given, you could prob­a­bly get a fair bit of mileage from ex­plain­ing the na­ture of vi­sion, even with­out the speci­fics of face-recog­ni­tion. I’m not re­ally sure what broader les­son that might have though, as I don’t fully un­der­stand the na­ture of the ques­tion you’re ask­ing.)

• I’ve been lurk­ing here for a while now and thought I had some­thing to add for the first time, so Hi all, thanks for all the great con­tent and con­cepts; on to the good stuff:

I think a good open prob­lem for the list would be: a for­mal (or a good solid) defin­tion of ra­tio­nal­ity. I know of things like BDI ar­chi­tec­ture and pareto op­ti­mal­ity but how do these things ap­ply to a hu­man ra­tio­nal be­ing. For that mat­ter, how do you rea­son log­i­cally/​for­mally about a hu­man be­ing? What would be a good ab­strac­tion/​struc­ture, are there any guidelines?

well, just my 2 for­mal cents.

• I think that prob­lem is too hard for us. As a gen­eral rule, sweep­ing defi­ni­tions made with­out re­spect to a par­tic­u­lar pur­pose are of limited util­ity. But thanks for com­ment­ing, and please don’t let my re­sponse de­ter you from com­ment­ing again.

• Well, I can give some classes of prob­lems. For in­stance, many of the bi­ases that we know about, we don’t re­ally know good ways for hu­mans to re­li­ably cor­rect for. So right there is a whole bunch of open prob­lems. (I know of some spe­cific ones with known de­bi­as­ing tech­niques, but many are “okay, great, I know I make this er­ror… but other than oc­ca­sion­ally be­ing lucky enough to di­rectly catch my­self in the act of do­ing so, it’s not re­ally ob­vi­ous how to cor­rect for these”)

Another, I guess va­guer one would be “gen­eral solu­tion that al­lows peo­ple to solve their akra­sia prob­lems”

• This one’s well-known in cer­tain quar­ters (so it’s not re­ally open), but it may provide train­ing for those un­fa­mil­iar with it.

Sup­pose that you ob­serve two ran­dom sam­ples from the uniform dis­tri­bu­tion cen­tered at un­known lo­ca­tion c with width 1. La­bel the sam­ples x_max and x_min. The ran­dom in­ter­val (x_min, x_max) is a 50% con­fi­dence in­ter­val for c be­cause it con­tains c* with 50% prob­a­bil­ity.

• What is the prac­ti­cal prob­lem with this con­fi­dence in­ter­val?

• Do bet­ter.

* changed from “de­vi­ates” per com­ment below

• “The uniform dis­tri­bu­tion cen­tered at c” does not seem to make sense. Did you per­chance mean the Gaus­sian dis­tri­bu­tion? Fur­ther, ‘de­vi­ates’ looks like jar­gon to me. Can we use ‘sam­ples’? I would there­fore rephrase as fol­lows, with spe­cific ex­am­ple to hang one’s vi­su­al­i­sa­tion on:

Heights of male hu­mans are known to have a Gaus­sian dis­tri­bu­tion of width 10 cm around some cen­tral value ; un­for­tu­nately you have for­got­ten what the cen­tral value is. Joe is 180 cm, Stephen is 170 cm. The prob­a­bil­ity that is be­tween these two heights is 50%; ex­plain why. Then find a bet­ter con­fi­dence in­ter­val for .

• I mean the con­tin­u­ous uniform dis­tri­bu­tion). “Cen­tered at c” is in­tended to in­di­cate that the mean of the dis­tri­bu­tion is c.

ETA: Let me be spe­cific—I’ll use the no­ta­tion of the linked Wikipe­dia ar­ti­cle.

You know that ba = 1.

c = (a + b)/​2 is un­known, and the con­fi­dence in­ter­val is sup­posed to help you in­fer it.

• If ex­actly half of all men have a height less than the cen­tral value c, than ran­domly pick­ing sam­ple will have a 50% chance of be­ing be­low c. Pick­ing two sam­ples (A and B) re­sults in four pos­si­ble sce­nar­ios:

1. A is less than c; B is greater than c

2. A is less than c; B is less than c

3. A is greater than c; B is greater than c

4. A is greater than c; B is less than c

The in­ter­val cre­ated by (A, B) con­tains c in sce­nar­ios (1) and (4) and does not con­tain c in sce­nar­ios (2) and (3). Since each sce­nario has an equal chance of oc­cur­ring, c is in (A, B) 50% of the time.

That is as far as I got just think­ing about it. If I am on the right path I can keep plug­ging away.

• In the Gaus­sian case, you can do bet­ter than (A, B) but the demon­stra­tion of that fact won’t smack you in the face they way it does in the case of the uniform dis­tri­bu­tion.

• One thing you can do in the uniform case is shorten the in­ter­val to at most length 12. Not sure if that’s face-smack­ing enough.

• You can do bet­ter than that. If the dis­tance be­tween the two data points is 74, you can shrink the 100% con­fi­dence in­ter­val to 14, etc. (The ex­treme case is as the dis­tance be­tween the two data points ap­proaches 2, your 100% con­fi­dence in­ter­val ap­proaches size zero.)

EDIT: whoops, I was stupid. Cor­rected 34 to 74 and 1 to 2. There, now it should be right

• Do we know the heights of the men A and B? If so, we can get a bet­ter es­ti­mate of whether c lies be­tween their heights by tak­ing into ac­count the differ­ence be­tween A and B...

• That’s the ba­sic idea. Now ap­ply it in the case of the uniform dis­tri­bu­tion.

• If all men are (say) within 10 cm of each other, and the heights are uniformly dis­tributed...

… if we have two men who are 8 cm apart, then c lies be­tween their heights with 80% prob­a­bil­ity?

• Get­ting there… 80% is too low.

• Wait, what? It must be 100%...

• That’s it. The so-called 50% con­fi­dence in­ter­val some­times con­tains c with cer­tainty. Also, when x_maxx_min is much smaller than 0.5, 50% is a lousy sum­mary of the con­fi­dence (ETA: com­mon us­age con­fi­dence, not fre­quen­tist con­fi­dence) that c lies be­tween them.

• If it’s less than 0.5, is the con­fi­dence sim­ply that value times 2?

• “Con­fi­dence” in the statis­tics sense doesn’t always have much to do with how con­fi­dent you are in the con­clu­sion. Some­thing that’s the real line in half of all cases and the empty set in the other half of all cases is a 50% con­fi­dence in­ter­val, but that doesn’t mean you’re ever 50% con­fi­dent (in the col­lo­quial sense) that the pa­ram­e­ter is on the real line or that the pa­ram­e­ter is in the empty set.

• The Cred­ible in­ter­val ar­ti­cle on Wikipe­dia de­scribes the dis­tinc­tion be­tween fre­quen­tist and Bayesian con­fi­dence in­ter­vals.

• The gen­eral pat­tern here is that there’s some­thing you do care about and some­thing you don’t care about, and fre­quen­tism doesn’t al­low you to talk about the thing you do care about, so it re­names the thing you don’t care about in such a way as to sug­gest that it’s the thing you do care about, and ev­ery­one who doesn’t un­der­stand statis­tics well in­ter­prets it as such.

• The in­ter­est­ing thing about the con­fi­dence in­ter­val I’m writ­ing about is that it has some fre­quen­tist op­ti­mal­ity prop­er­ties. (“Uniformly most ac­cu­rate”, if any­one cares.)

• Well. So if all men were within 10 cm of each other, and uniformly dis­tributed, and we plucked 2 ran­dom men out, and they were 4cm apart, would c be be­tween them with 80% prob­a­bil­ity? Or some other value?

• The shorter man can be be­tween c-5 and c+1 with all val­ues equally prob­a­ble, if he’s be­tween c-5 and c-4 or c and c+1 then c is not be­tween them, if he’s be­tween c-4 and c then c is be­tween them, so as­sum­ing a uniform prior for c the prob­a­bil­ity is 23 if I’m not mis­taken.

• Ah, I see what I did wrong. I think.

• Yup. Un­der the uniform prior the pos­te­rior prob­a­bil­ity that c is be­tween the two val­ues is d/​(1 - d), 0 < d < 0.5, where d = x_maxx_min (and the width of the uniform data dis­tri­bu­tion is 1).

• The an­swer to that de­pends on what you know about c be­fore­hand—your prior prob­a­bil­ity for c.

• It’s not be­tween them if the shorter man is 4-5 cm shorter than av­er­age or 0-1 cm taller than av­er­age, so yes, 80% as­sum­ing a uniform prior for c.

• Whoops—“con­fi­dence” is fre­quen­tist jar­gon. I’ll just say that any bet­ter method ought to take x_maxx_min into ac­count.

• This does some­what dodge the ques­tion, but it does make a differ­ence that an in­finite set of coun­terex­am­ples can be as­so­ci­ated with each coun­terex­am­ple. That is, if (a,b,c,n) is not a solu­tion to the Fer­mat equa­tion, then (ka,kb,kc,n) isn’t ei­ther for any pos­i­tive in­te­ger k.

• This is not a game ques­tion, but it may be an in­ter­est­ing ques­tion re­gard­ing de­ci­sion mak­ing for hu­mans:

What is the to­tal Shan­non en­tropy of the vari­ables con­trol­ling whether or not a hu­man will do what it con­sciously be­lieves will lead to the most de­sir­able out­come?

If all hu­mans cur­rently al­ive col­lec­tively rep­re­sent ev­ery pos­si­ble vari­able com­bi­na­tion in this re­gard, the max­i­mum value for the an­swer is 32.7 bits[1]. That is, 33 on/​off switches com­pletely de­cide whether or not you will put off do­ing your home­work[2]. Is the cor­rect value higher or lower?

Some vari­ables might be ag­gre­gated from ana­log sources, such as adrenal­ine level, in com­bi­na­tion with a se­ries of thresh­olds spe­cific to the in­di­vi­d­ual.

1. Two to the power of 32.7 is the cur­rent world pop­u­la­tion.

2. Sub­sti­tute “home­work” with what­ever you de­sire.

I made up this ques­tion just now, and sus­pect it may be in­sanely stupid.

• If all hu­mans cur­rently al­ive col­lec­tively rep­re­sent ev­ery pos­si­ble vari­able com­bi­na­tion in this re­gard, the max­i­mum value for the an­swer is 32.7 bits[1]. That is, 33 on/​off switches com­pletely de­cide whether or not you will put off do­ing your home­work[2]. Is the cor­rect value higher or lower?

Much, much higher. The hu­mans cur­rently al­ive rep­re­sent only a very sparse sam­pling of pos­si­ble com­bi­na­tions of genes, and an even sparser sam­pling of pos­si­ble com­bi­na­tions of life ex­pe­riences. I don’t see any ob­vi­ous rea­son why the an­swer to this ques­tion shouldn’t be greater than the num­ber of sub­atomic par­ti­cles in your body.

• I don’t see any ob­vi­ous rea­son why the an­swer to this ques­tion shouldn’t be greater than the num­ber of sub­atomic par­ti­cles in your body.

Clar­ifi­ca­tion: I am only talk­ing about di­rect in­puts to the de­ci­sion mak­ing pro­cess, not what they’re ag­gre­gated from (which would be the ob­serv­able uni­verse).

• Due to chaotic /​ non-lin­ear effects, you’re not go­ing to get any­where near the com­pres­sion you need for 33 bits to be enough...I’m very con­fi­dent the an­swer is much much higher...

• How do you build a smart, syn­thetic goal-seek­ing agent? I be­lieve there are some as­so­ci­ated prob­lems as well.

• One such prob­lem has already come up:

1. How can we train ra­tio­nal­ity?

• A lot of our ir­ra­tional­ity seems to be ra­tio­nal­ity tuned for small sam­ple sizes. When you live in a tribe of <200 peo­ple, any given event or opinion has a lot of weight. We evolved to do sci­ence on a small scale. How do we get around Dun­bar’s limit?

• This isn’t a for­mal prob­lem that can be “solved” with a for­mal solu­tion. I am speci­fi­cally talk­ing about prob­lems like the An­gel prob­lem or P = NP.

Ex­am­ples I can think of from the top of my head are New­comb’s prob­lem and the Pri­soner’s dilemma. Both of these can be ex­pressed for­mally with Bayesian terms. Have the prob­lems been solved? I as­sumed so or I would have brought them up in my post.

For fun, I am start­ing to work out what is needed to tackle New­comb’s prob­lem and it cer­tainly seems doable. I figured it is a good test of my new Bayesian skillz. Game the­ory claims to have “solved” one-shot PDs but not in a way that makes sense in helping some­one de­cide what to do in a real life ex­am­ple. New­comb’s seemed eas­ier so I am start­ing with that.

• Ok, I had in­ter­preted the scope more widely than you in­tended.

I be­lieve Eliezer has a for­mal anal­y­sis of New­comb’s prob­lem, but I don’t know if he’s pub­lished it any­where.

• There are a fair num­ber of for­mal analy­ses of New­comb’s prob­lem. I par­tic­u­larly like this one:

D.H. Wolpert and G. Ben­ford, What does New­comb’s para­dox teach us? (show­ing that the stan­dard ap­proaches to the para­dox en­code fun­da­men­tally differ­ent—and in­con­sis­tent—views about the na­ture of the de­ci­sion prob­lem, and clear­ing up a num­ber of other con­fu­sions.)

• Well, this is un­speak­ably late, but bet­ter late than never.

I was be­gin­ning to think in all se­ri­ous­ness that none of you would ever be­gin ask­ing what ques­tions you should be ask­ing. It’s nice to be proven wrong.

• Well, it doesn’t seem all that long to me. We’re not in one of Eliezer’s sto­ries af­ter all.

And if you get enough philo­soph­i­cally-minded folks around a table, we spend the first hour or so talk­ing about the table, and then the next hour or so talk­ing about our dis­cus­sion of the table.

EDIT: changed ‘they’ and ‘their’ to ‘we’ and ‘our’