• As your be­lief about how well AGI is likely to go af­fects both the like­li­hood of a bet be­ing eval­u­ated, and the chance of win­ning, so bets about AGI are likely to give du­bi­ous re­sults. I also have sub­stan­tial un­cer­tainty about the value of money in a post sin­gu­lar­ity world. Most ob­vi­ously is ev­ery­one get­ting turned into pa­per­clips, noone has any use for money. If we get a friendly sin­gle­ton su­per-in­tel­li­gence, ev­ery­one is liv­ing in par­adise, whether or not they had money be­fore. If we get an eco­nomic sin­gu­lar­ity, where liber­tar­ian ASI(s) try to make money with­out cheat­ing, then money could be valuable. I’m not sure how we would get that, as an un­der­stand­ing of the con­trol prob­lem good enough to not wipe out hu­mans and fill the uni­verse with bank notes should be enough to make some­thing closer to friendly.

Even if we do get some kind of as­cen­dant econ­omy, given the amount of re­sources in the so­lar sys­tem (let alone wider uni­verse), its quite pos­si­ble that pocket change would be enough to live for aeons of lux­ury.

Given how un­clear it is about whether or not the bet will get paid and how much the cash would be worth if it was, I doubt that the bet­ting will pro­duce good info. If ev­ery­one thinks that money is more likely than not to be use­less to them af­ter ASI, then al­most no one will be pre­pared to lock their cap­i­tal up un­til then in a bet.

• I sus­pect that an AGI with such a de­sign could be much safer, if it was hard­coded to be­lieve that time travel and hy­per­ex­po­nen­tially vast uni­verses were im­pos­si­ble. Sup­pose that the AGI thought that there was a 0.0001% chance that it could use a galax­ies worth of re­sources to send 10^30 pa­per­clips back in time. Or cre­ate a par­allel uni­verse con­tain­ing 3^^^3 pa­per­clips. It will still chase those op­tions.

If start­ing a long plan to take over the world costs it liter­ally noth­ing, it will do it any­way. A se­quence of short term plans, each de­signed to make as many pa­per­clips as pos­si­ble within the next few min­utes could still end up dan­ger­ous. If the num­ber of pa­per­clips at time is , and its power at time is , then , would mean that both power and pa­per­clips grew ex­po­nen­tially. This is what would hap­pen if power can be used to gain power and clips at the same time, with min­i­mal loss of ei­ther from also pur­su­ing the other.

If power can only be used to gain one thing at a time, and the rate power can grow at is less than the rate of time dis­count, then we are safer.

This pro­posal has sev­eral ways to be caught out, world wreck­ing as­sump­tions that aren’t cer­tain, but if used with care, a short time frame, an on­tol­ogy that con­sid­ers time­travel im­pos­si­ble, and say a util­ity func­tion that maxes out at 10 clips, it prob­a­bly won’t de­stroy the world. Throw in mild op­ti­miza­tion and an im­pact penalty, and you have a sys­tem that re­lies on a dis­junc­tion of shaky as­sump­tions, not a con­junc­tion of them.

It is a CDT agent, or some­thing that doesn’t try to pun­ish you now so you make pa­per­clips last week. A TDT agent might de­cide to take the policy of kil­ling any­one who didn’t make clips be­fore it was turned on, caus­ing hu­mans that pre­dict this to make clips.

I sus­pect that it would be pos­si­ble to build such an agent, prove that there are no weird failure modes left, and turn it on, with a small chance of de­stroy­ing the world. I’m not sure why you would do that. Once you un­der­stand the sys­tem well enough to say its safe-ish, what vi­tal info do yo gain from turn­ing it on?

• But­terfly effects es­sen­tially un­pre­dictable, given your par­tial knowl­edge of the world. Sure, you do­ing home­work could cause a tor­nado in Texas, but it’s equally likely to pre­vent that. To ac­tu­ally pre­dict which, you would have to calcu­late the move­ment of ev­ery gust of air around the world. Other­wise your shuffling an already well shuffled pack of cards. Bear in mind that you have no rea­son to dis­t­in­guish the par­tic­u­lar ac­tion of “do­ing home­work” from a vast set of other ac­tions. If you re­ally did know what ac­tions would stop the Texas tor­nado, they might well look like ran­dom thrash­ing.

What you can calcu­late is the re­li­able effects of do­ing your home­work. So, given bounded ra­tio­nal­ity, you are prob­a­bly best to base your de­ci­sions on those. The fact that this only in­volves home­work might sug­gest that you have an in­ter­nal con­flict be­tween a part of your­self that thinks about ca­reers, and a short term pro­cras­ti­na­tor.

Most peo­ple who aren’t par­tic­u­larly eth­i­cal still do more good than harm. (If ev­ery­one looks out for them­selves, ev­ery­one has some­one to look out for them. The law stops most of the bad mu­tual defec­tions in pris­on­ers dilem­mas) Evil ge­nius try­ing to trick you into do­ing harm are much rarer than mod­er­ately com­pe­tent nice peo­ple try­ing to get your help to do good.

• This is an ex­am­ple of a pas­cals mug­ging. Tiny prob­a­bil­ities of vast re­wards can pro­duce weird be­hav­ior. The best known solu­tion is ei­ther a bounded util­ity func­tion, or a an­tipas­calene agent. (An agent that ig­nores the best x% and worst y% of pos­si­ble wor­lds when calcu­lat­ing ex­pected util­ities. It can be money pumped)

• Get a pack of cards in which some cards are blue on both sides, and some are red on one side and blue on the other. Pick a ran­dom card from the pile. If the sub­ject is shown one side of the card, and its blue, they gain a bit of ev­i­dence that the card is blue on both sides. Give them the op­tion to bet on the colour of the other side of the card, be­fore and af­ter they see the first side. In­vert the prospect the­ory curve to get from im­plicit prob­a­bil­ity to bet­ting be­havi­our. The peo­ple should perform a larger up­date in log odds when the pack is mostly one type of card, over when the pack is 50 : 50.

• I sus­pect that if vot­ing re­duced your own karma, some peo­ple wouldn’t vote. As it be­comes ob­vi­ous that this is hap­pen­ing, more peo­ple stop vot­ing, un­til karma just stops flow­ing at all. (The peo­ple who per­sis­tently vote any­way all run out of karma.)

# Propo­si­tional Logic, Syn­tac­tic Implication

10 Feb 2019 18:12 UTC
# Prob­a­bil­ity space has 2 metrics

10 Feb 2019 0:28 UTC
• This is mak­ing the some­what du­bi­ous as­sump­tion that X risks are not so ne­glected that even a “self­ish” in­di­vi­d­ual would work to re­duce them. Of course, in the not too un­rea­son­able sce­nario where the cos­mic com­mons is di­vided up evenly, and you use your por­tion to make a vast num­ber of du­pli­cates of your­self, the util­ity, if your util­ity is lin­ear in copies of your­self, would be vast. Or you might hope to live for a ridicu­lously long time in a post sin­gu­lar­ity world.

The effect that a sin­gle per­son can have on X risks is small, but if they were self­ish with no time dis­count­ing, it would be a bet­ter op­tion than he­do­nism now. Although a third al­ter­na­tive of sit­ting in a padded room be­ing very very safe could be even bet­ter.

• I sus­pect that the so­cial in­sti­tu­tions of Law and Money are likely to be­come in­creas­ingly ir­rele­vant back­ground to the de­vel­op­ment of ASI.

Deter­rence Fails.

If you be­lieve that there is a good chance of im­mor­tal utopia, and a large chance of pa­per­clips in the next 5 years, the threat that the cops might throw you in jail, (on the off chance that they are still in power) is neg­ligible.

The law is blind to safety.

The law is bu­reau­cratic and os­sified. It is prob­a­bly not em­ploy­ing much top tal­ent, as it’s hard to tell top tal­ent from the rest if you aren’t as good your­self (and it doesn’t have the bud­get or glamor to at­tract them). Tel­ling whether an or­ga­ni­za­tion is on line for not de­stroy­ing the world is HARD. The safety pro­to­cols are be­ing in­vented on the fly by each team, the sys­tem is very com­plex and tech­ni­cal and only half built. The teams that would de­stroy the world aren’t idiots, they are still pro­duc­ing long pa­pers full of maths and talk­ing about the im­por­tance of safety a lot. There are no ex­am­ples to work with, or un­der­stood laws.

Likely as not (not re­ally, too much con­ju­ga­tion here), you get some ran­dom in­spec­tor with a check­list full of thing that sound like a good idea to peo­ple who don’t un­der­stand the prob­lem. All AI work has to have an emer­gency stop but­ton that turns the power off. (The idea of an AI cir­cum­vent­ing this was not con­sid­ered by the per­son who wrote the list).

All the law can re­ally do is tell what pub­lic image an AI group want’s to pre­sent, provide fund­ing to ev­ery­one, and get in ev­ery­one’s way. Tel­ling cops to “smash all GPU’s” would have an effect on AI progress. The fund vs smash axis is about the only lever they have. They can’t even tell an AI pro­ject from a maths con­ven­tion from a nor­mal pro­gram­ming pro­ject if the pro­ject lead­ers are in­cen­tivized to obfus­cate.

After ASI, gov­ern­ments are likely only rele­vant if the ASI was pro­grammed to care about them. Nei­ther pa­per­clip­pers or FAI will care about the law. The law might be rele­vant if we had tasky ASI that was not triv­ial to lev­er­age into a de­ci­sive strate­gic ad­van­tage. (An AI that can put a straw­berry on a plate with­out de­stroy­ing the world, but that’s about the limit of its safe op­er­a­tion.)

Such an AI em­bod­ies an un­der­stand­ing of in­tel­li­gence and could eas­ily be ac­ci­den­tally mod­ified to de­stroy the world. Such sce­nar­ios might in­volve ASI and timescales long enough for the law to act.

I don’t know how the law can han­dle some­thing that, can eas­ily de­stroy the world, has some eco­nomic value (if you want to flirt dan­ger) and, with fur­ther re­search could grant supreme power. The dis­cov­ery must be limited to a small group of peo­ple, (law of large num­ber of non­ex­perts, one will do some­thing stupid). I don’t think the law could no­tice what it was, af­ter all the robot in-front of the in­spec­tor only puts straw­ber­ries on plates. They can’t tell how pow­er­ful it would be with an un­bounded util­ity func­tion.

• Firstly, you are con­fus­ing dol­lars and utils.

If you buy this product for $100, you gain the use of it, at value U[30] to your­self. The work­ers who made it gain$80, at value U[80] to your­self, be­cause of your util­i­tar­ian prefer­ences. To­tal value U[110]

If the al­ter­na­tive was a product of cost $100, which you value the use of at U[105], but all the money goes to greedy rich peo­ple to be squan­dered, then you would choose the first. If the al­ter­na­tive was spend­ing$100 to do some­thing in­sanely morally im­por­tant, U[3^^^3], you would do that.

If the al­ter­na­tive was a product of cost \$100, that was of value U[100] to your­self, and some of the money would go to peo­ple that weren’t that rich U[15], you would do that.

If you could give the money to peo­ple twice as des­per­ate as the work­ers, at U[160], you would do that.

There are also good rea­sons why you might want to dis­cour­age mo­nop­o­lies. Any de­sire to do so is not in­cluded in the ex­pected value calcu­la­tions. But the ba­sic prin­ci­ple is that util­i­tar­i­anism can never tell you if some ac­tion is a good use of a re­source, un­less you tell it what else that re­source could have been used for.

• The in­for­ma­tion needed to de­scribe our par­tic­u­lar laws of physics < info needed to de­scribe the con­cept of “hab­it­able uni­verse” in gen­eral < info needed to de­scribe hu­man-like mind.

The biggest slip is the equiv­o­ca­tion of the word in­tel­li­gence. The Kol­mogorov com­plex­ity of AIXI-tl is quite small, so in­tel­li­gence’s in that sense of the word are likely to ex­ist in the uni­ver­sal prior.

Hu­man­like minds have not only the clear mark of evolu­tion, but the mark of stone age tribal in­ter­ac­tions across their psy­che. An ar­bi­trary mind will be bizarre and alien. Won­der­ing if such a mind might be benev­olent is hugely priv­ileg­ing the hy­poth­e­sis. The most likely way to make a hu­man­like mind is the pro­cess that cre­ated hu­mans. So in most of the uni­verses with hu­manoid deities, those deities evolved. This be­comes the simu­la­tion hy­poth­e­sis.

The best hy­poth­e­sis is still the laws of quan­tum physics or what­ever.

• We don’t know what we are miss­ing out on with­out su­per in­tel­li­gence. There might be all sorts of amaz­ing things that we would just never con­sider to make, or dis­miss as ob­vi­ously im­pos­si­ble, with­out su­per in­tel­li­gence.

I am point­ing out that be­ing able to make a FAI that is a bit smarter than you (smart­ness not re­ally on a sin­gle scale, vastly differ­ent cog­ni­tive ar­chi­tec­ture, is deep blue smarter than a horse?), in­volves solv­ing al­most all the hard prob­lems in al­ign­ment. When we have done all that hard work, we might as well tell it to make it­self a trillion times smarter, the cost to us is neg­ligible, the benefit could be huge.

AI can also serve as as a val­ues repos­i­tory. In most cir­cum­stances, val­ues are go­ing to drift over time, pos­si­bly due evolu­tion­ary forces. If we don’t want to end up as hard­scrap­ple fron­tier repli­ca­tors, we need some kind of sin­gle­ton. Most types of gov­ern­ment or com­mit­tee have their own forms of value drift, and couldn’t keep enough of an ab­solute grip on power to stop any re­bel­lions for billions of years. I have no ideas other than Friendly ASI over­sight for how to stop some­one in a cos­mi­cally vast so­ciety from cre­at­ing a UFASI. Suffi­ciently dra­co­nian ban­ning of any­thing at all tech­nolog­i­cal could stop any­one from cre­at­ing UFASI long term, and also stop most things since the in­dus­trial rev­olu­tion.

The only rea­son­able sce­nario that I can see in which FAI is not cre­ated and the cos­mic com­mons gets put to good use is if a small group of like­minded in­di­vi­d­u­als, or sin­gle per­son, gains ex­clu­sive ac­cess to self­rep nan­otech and mind up­load­ing. They then use many copies of them­selves to po­lice the world. They do all pro­gram­ming and only run code they can for­mally prove isn’t dan­ger­ous. No-one is al­lowed to touch any­thing Tur­ing com­plete.

• Both blanks are the iden­tity func­tion.

Here is some psudo code

class Prover:

____def new(self):

________self.ps=[PA]

____def prove(self, p, s, b):

________as­sert p in self.ps

________re­turn p(s,b)

________if self.prove(p1,“forall s:(ex­ists b2: p2(s,b2))=> (ex­ists b1: p2(s,b1))”, b)

____________self.ps.ap­pend(p2)

prover=Prover()

Where PA is a spe­cific peano ar­ith­matic proof checker. nPA is an­other proof checker. and ‘proof’ is a proof that any­thing nPA can prove, PA can prove too.

# Allow­ing a for­mal proof sys­tem to self im­prove while avoid­ing Lo­bian ob­sta­cles.

23 Jan 2019 23:04 UTC
• I con­sider emo­tions to be data, not goals. From this point of view, de­liber­ately max­i­miz­ing hap­piness for its own sake is a lost pur­pose. Its like writ­ing ex­tra num­bers on your bank bal­ance. If how­ever your hap­piness was re­li­ably too low, ad­just­ing it up­wards with drugs would be sen­si­ble. Whats the best level of hap­piness, the one that pro­duces op­ti­mal be­hav­ior.

I also find my emo­tions to be quite weak. And I can set them con­sciously change them. Just think­ing “be happy”, or “be sad” and feel­ing happy or sad. It ac­tu­ally feels similar to imag­in­ing a men­tal image, sound or smell.

• Writ­ing ran­dom bits of code is a good hobby. It sounds like you pre­fer do­ing that than learn­ing to play jazz, so for­get the jazz and just code. I was hav­ing a hard job un­der­stand­ing quan­tum spin, and wrote some code to help. It was rea­son­ably helpful. Then again, quan­tum spin is all about com­plex ma­trix mul­ti­pli­ca­tion, and numpy has func­tions for that, so I was ba­si­cally us­ing it as a ma­trix ar­ith­metic calcu­la­tor. Another ex­am­ple, I found that I kept get­ting dis­tracted, so I wrote code that ran­domly beeped, asked what I was do­ing, and saved the re­sults to a file. It worked quite well.

• Sure, that sounds in­ter­est­ing. I have a bunch of things that I’m con­fused about.

• What if it fol­lows hu­man norms with dan­ger­ously su­per­hu­man skill.

Sup­pose hu­mans had a re­ally strong norm that you were al­lowed to say what­ever you like, and en­couraged to say things oth­ers will find in­ter­est­ing.

Among hu­mans, the most we can ex­ert is a small op­ti­miza­tion for the not to­tally dull.

The AI pro­duces a se­quence that effec­tively hacks the hu­man brain and sets in­ter­est to max­i­mum.