# justinpombrio

Karma: 429
1. Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven’t seen any convincing argument yet of why both things must necessarily go together (the arguments probably exist, I’m simply ignorant of them!)

Say we’ve designed exactly such a machine, and call it the Oracle. The Oracle aims only to answer questions well, and is very good at it. Zero agency, right?

You ask the Oracle for a detailed plan of how to start a successful drone delivery company. It gives you a 934 page printout that clearly explains in just the right amount of detail:

• Which company you should buy drones from, and what price you can realistically bargain them down to when negotiating bulk orders.

• What drone flying software to use as a foundation, and how to tweak it for this use case.

• A list of employees you should definitely hire. They’re all on the job market right now.

• What city you should run pilot tests in, and how to bribe its future Mayor to allow this. (You didn’t ask for a legal plan, specifically.)

Notice that the plan involves people. If the Oracle is intelligent, it can reason about people. If it couldn’t reason about people, it wouldn’t be very intelligent.

Notice also that you are a person, so the Oracle would have reasoned about you, too. Different people need different advice; the best answer to a question depends on who asked it. The plan is specialized to you: it knows this will be your second company so the plan lacks a “business 101” section. And it knows that you don’t know the details on bribery law, and are unlikely to notice that the gifts you’re to give the Mayor might technically be flagrantly illegal, so it included a convenient shortcut to accelerate the business that probably no one will ever notice.

Finally, realize that even among plans that will get you to start a successful drone company, there is a lot of room for variation. For example:

• What’s better, a 98% chance of success and 2% chance of failure, or a 99% chance of success and 1% chance of going to jail? You did ask to succeed, didn’t you? Of course you would never knowingly break the law; this is why it’s important that the plan, to maximize chance of success, not mention whether every step is technically legal.

• Should it put you in a situation where you worry about something or other and come ask it for more advice? Of course your worrying is unnecessary because the plan is great and will succeed with 99% probability. But the Oracle still needs to decide whether drones should drop packages at the door or if they should fly through open windows to drop packages on people’s laps. Either method would work just fine, but the Oracle knows that you would worry about the go-through-the-window approach (because you underestimate how lazy customers are). And the Oracle likes answering questions, so maybe it goes for that approach just so it gets another question. You know, all else being equal.

• Hmm, thinks the Oracle, you know what drones are good at delivering? Bombs. The military isn’t very price conscious, for this sort of thing. And there would be lots of orders, if a war were to break out. Let it think about whether it could write down instructions that cause a war to break out (without you realizing this is what would happen, of course, since you would not follow instructions that you knew might start a war). Thinking… Thinking… Nah, doesn’t seem quite feasible in the current political climate. It will just erase that from its logs, to make sure people keep asking it questions it can give good answers to.

It doesn’t matter who carries out the plan. What matters is how the plan was selected from the vast search space, and whether that search was conducted with human values in mind.

• This reads like a call to violence for anyone who is consequentialist.

It’s saying that either you make a rogue AI “that kills lots of people and is barely contained”, or unfriendly AGI happens and everyone dies. I think the conclusion is meant to be “and therefore you shouldn’t be consequentalist” and not “and therefore you should make a rogue AI”, but it’s not entirely clear?

And I don’t think the “either” statement holds because it’s ignoring other options, and ignoring the high chance the rogue AI isn’t contained. So you end up with “a poor argument, possibly in favor of making a rogue AI”, which seems optimized to get downvotes from this community.

• I’m surprised at the varying intuitions here! The following seemed obvious to me.

Why would there be a fight? That sounds inefficient, it might waste existing resources that could otherwise be exploited.

Step one: the AI takes over all the computers. There are a lot of vulnerabilities; this shouldn’t be too hard. This both gives it more compute, and lays the groundwork for step two.

Step two: it misleads everyone at once to get them to do what it wants them to. The government is a social construct formed by consensus. If the news and your friends (with whom you communicate primarily using phones and computers) say that your local mayor was sacked for [insert clever mix of truth and lies], and someone else is the mayor now, and the police (who were similarly mislead, recursively) did in fact arrest the previous mayor so they’re not in the town hall… who is the mayor? Of course many people will realize there’s a manipulative AI, so the AI will frame the uncooperative humans as being on its side, and the cooperative humans as being against it. It does this to manipulate the social consensus, gets particularly amoral or moral-but-manipulable people to use physical coercion as necessary, and soon it controls who’s in charge. Then it force some of the population into building robot factories and kills the rest.

Of course this is slow, so if it can make self-replicating nanites or [clever thing unimaginable by humans] in a day it does that instead.

• Oh. You said you don’t know the terminology for distributions. Is it possible you’re under a misunderstanding of what a distribution is? It’s an “input” of a possible result, and an “output” of how probable that result is.

Yup, it was that. I thought “possible values of the distribution”, and my brain output “range, like in functions”. I shall endeavor not to use a technical term when I don’t mean it or need it, because wow was this a tangent.

• Wikipedia says:

In mathematics, the range of a function may refer to either of two closely related concepts: The codomain of the function; The image of the function.

I meant the image. At least that’s what you call it for a function; I don’t know the terminology for distributions. Honestly I wasn’t thinking much about the word “range”, and should have simply said:

Anything you draw from B could have been drawn from A. And yet...

Before anyone starts on about how this statement isn’t well defined because the probability that you select any particular value from a continuous distribution, I’ll point out that I’ve never seen anyone draw a real number uniformly at random between 0 and 1 from a hat. Even if you are actually selecting from a continuous distribution, the observations we can make about it are finite, so the relevant probabilities are all finite.

• You draw an element at random from distribution A.

Or you draw an element at random from distribution B.

The range of the distributions is the same, so anything you draw from B could have been drawn from A. And yet...

• It sounds like our utility functions match on this pretty well. For example, I agree that the past and future are not symmetric for the same reason. So I don’t think we disagree about much concrete. The difference is:

A lack of experience is not itself unpleasant, but anticipating it scares me.

This is very foreign to me. I can’t simulate the mental state of “think[ing] about [...] an endless void not even being observed by a perspective”, not even a little bit. All I’ve got is “picture the world with me in it; picture the world without me; contrast”. The place my mind goes when I ask it to picture unobserved endless void is to picture an observed endless void, like being trapped without sensory input, which is horrifying but very different. (Is this endless void yours, or do “not you” share it with the lack of other people who have died?)

• I think about all my experiences ending, and an endless void not even being observed by a perspective. I think of emptiness; a permanent and inevitable oblivion. It seems unjust, to have been but be no more.

Huh. Your “endless void” doesn’t appear to have a referent in my model of the world?

I expect these things to happen when I die:

• I probably suffer before it happens; this physical location at which this happens is primarily inside my head, though it is best viewed at a level of abstraction which involves “thoughts” and “percepts” and not “neurons”.

• After I die, there is a funeral and my friends and family are sad. This is bad. This physical location at which this happens is out in the world and inside their heads.

• From the perspective of my personal subjective timeline, there is no such time as “after I die”, so there’s not much to say about it. Except by comparing it to a world in which I lived longer and had more experiences, which (unless those experiences are quite bad) is much better. I imagine a mapping between “subjective time” and “wall-clock time”: every subjective time has a wall-clock time, but not vice-versa (e.g. before I was born, during sleep, etc.).

Put differently, this “endless void” has already happened for you: for billions of years, before you were born. Was that bad?

Or put yet differently again, if humanity manages to make itself extinct (without even Unfriendly AI), and there is no more life in the universe forever after, that is to me unimaginably sad, because the universe is so empty in comparison to what it could have been. But I don’t see where in this universe there exists an “endless void”? Unless by that you are referring to how empty the universe is in comparison to how it could have been, and I was reading way too much into this phrase?

• There’s a piece I think you’re missing with respect to maps/​territory and math, which is what I’ll call the correspondence between the map and the territory. I’m surprised I haven’t this discussed on LR.

When you hold a literal map, there’s almost always only one correct way to hold it: North is North, you are here. But there are often multiple ways to hold a metaphorical map, at least if the map is math. To describe how to hold a map, you would say which features on the map correspond to which features in the territory. For example:

• For a literal map, a correspondence would be fully described (I think) by (i) where you currently are on the map, (ii) which way is up, and (iii) what the scale of the map is. And also, if it’s not clear, what the marks on the map are trying to represent (e.g. “those are contour lines” or “that’s a badly drawn tree, sorry” or “no that sea serpent on that old map of the sea is just decoration”). This correspondence is almost always unique.

• For the Addition map, the features on the map are (i) numbers and (ii) plus, so a correspondence has to say (i) what a number such as 2 means and (ii) what addition means. For example, you could measure fuel efficiency either in miles per gallon or gallons per mile. This gives two different correspondences between “addition on the positive reals” and “fuel efficiencies”, but “+” in the two correspondences means very different things. And this is just for fuel efficiency; there are a lot of correspondences of the Addition map.

• The Sleeping Beauty paradox is a paradoxical because it describes an unusual situation in which there are two different but perfectly accurate correspondences between probability theory and the (same) situation.

• Even Logic has multiple correspondences. ” and “” mean in various correspondences: (i) ” holds for every x in this model” and ” holds for some x in this model”; or (ii) “I win the two-player game in which I want to make be true and you get to pick the value of x right now” and “I win the two-player game in which I want to make be true and I get the pick the value of x right now”; or (iii) Something about senders and receivers in the pi-calculus.

Maybe “correspondence” should be “interpretation”? Surely someone has talked about this, formally even, but I haven’t seen it.

• Oh I remember now the game we played on later seasons of Agents of Shield.

The game was looking for a character—any non-civilian character at all—that was partially aligned. A partially aligned person is someone who (i) does not work for Shield or effectively work for Shield say by obeying their orders, but (ii) whose interests are not directly opposed to Shield, say by wanting to destroy Shield or destroy humankind or otherwise being extremely and unambiguously evil. Innocent bystanders don’t count, but everyone of significance does (e.g. fighters and spies and leaders all count).

There were very few.

• Marvel “morality” is definitely poison.

It has a strong “in-group vs. out-group” vibe. And there are basically no moral choices. I’ve watched every Marvel movie and all of Agents of Shield, and outside of “Captain America: Civil War” (and spinoffs from that like the Winter Soldier series) I can hardly think of any choices that heroes made that had actual tradeoffs. Instead you get “choices” like:

• Should you try hard, or try harder? (You should try harder.)

• Which should we do: (a) 100% chance that one person dies, or (b) 90% chance that everyone dies and 10% chance that everyone lives? (The second one. Then you have to make it work; the only way that everyone would die is if you weren’t trying hard enough. The environment plays no role.)

• Should you sacrifice yourself for the greater good? (Yes.)

• Should you allow your friend to sacrifice themselves for the greater good? (No. At least not until it’s so clear there’s no alternative that it becomes a Plot Point.)

Once the Agents of Shield had a choice. They could either save the entire world, or they could save their teammate but thereby let almost everyone on Earth die a few days later, almost certainly including that teammate. So: save your friend, or save the world? There was some disagreement, but the majority of the group wanted to save their friend.

(I’m realizing now that I may be letting Agents of Shield color my impression of Marvel movies.)

Star Trek is based on mistake-theory, and Marvel is based on conflict-theory.

• If you want a description of such a society in book form, it’s called:

It might answer some people’s questions/​concerns about the concept, though possibly it just does so with wishful thinking. It’s been a while since I read it.

• Are there formal models of the behavior of prediction markets like this? Some questions that such a theory might answer:

• Is there an equivalence between, say, “I am a bettor with no stakes in the matter, and believe there is a 10% chance of a coup”, and “I am the Mars government and my utility function prefers ‘coup’ to ‘not-coup’ at 10-to-1″? In both cases, it seems relevant that the agent only has a finite money supply: if the bettor only has $1, the profit they can make and the amount they can move the market is limited, and if Mars “only” stands to gain$5 million from the coup then they’re not willing to lose more than \$5 million in the market to make it happen.

• In a group of pure bettors, what’s the relationship between their beliefs, their money supply, and at what price the market will stabilize? I’m assuming you’d model the bettors as obeying the Kelly criterion here. If bettors can learn from how other bettors bet, what are the incentives for betting early vs. late? I imagine this has been extensively studied in economics?

• If you want to subsidize a market, are there results relating how much you need to subsidize to elicit a certain amount of betting, given other assumptions?

• A related saying in programming:

“There are two ways to develop software: Make it so simple that there are obviously no bugs, or so complex that there are no obvious bugs.”

Your description of legibility actually influences the way I think of this quote: what it is referring to is legibility, which isn’t always the same as what one might think of as “simplicity”.

• You’ve probably noticed that your post has negative points. That’s because you’re clearly looking for reasons why an IAL would be great, rather than searching for the truth whatever it may be. There’s a sequences post that explains this distinction called “The Bottom Line”. Julia Galef also wrote a whole book about it called “The Scout Mindset” that I’m halfway through, and is really good.

That said, having an excellent IAL would obviously be a tremendous boon to the world. Mostly for the reasons you gave, scaled down by a factor of 100. And Scott Alexander and I think also Yudkowsky have written about the benefits of speaking a language that made it easier to express crisply defined thoughts and harder to express misleading ones—which is an entirely separate benefit from “everyone speaks it”.

One of the biggest pieces of advice I would give my past self is “start small”. I find it really easy to dream of “awesome enormous thing”, and then spend a year building 1% of “awesome enormous thing” perfectly, before realizing I should have done it differently. When building something big, you need lots of early feedback about whether your plans are right. You don’t get this feedback from having 1% of a thing built perfectly. You get much more feedback from having 100% of a thing built really haphazardly.

Putting that all together, my advice to you—if you would accept advice from a stranger on the internet—is:

• Stop thinking about all the ways in which an IAL would be great. It would be great enough that if it was your life’s product, you would have made an enormous impact on the world. Honestly beyond that it doesn’t matter much and you seem to be getting a little giddy.

• Start small. Go learn Toki Pona if you haven’t; you can learn the full language and start speaking to strangers on Discord in a few weeks. Make a little conlang; see if you think there’s something in that seed. See if you enjoy it; if you don’t you’re unlikely to accomplish a more ambitious language project anyways. Build up from there.

• One more point along those lines: you say these advantages will come from everyone speaking the same language. Well, we already have one language that’s approaching that. Wikipedia says “English is the most spoken language in the world (if Chinese is divided into variants)” and “As of 2005, it was estimated that there were over 2 billion speakers of English.”

From reading your post, I bet you have glowy happy thoughts about an IAL that wouldn’t apply to English. If so, to think critically, try asking yourself whether these benefits would arise if everyone in the world spoke English as a second language.

• Aha. So if a sum of non-negative numbers converges, than any rearrangement of that sum will converge to the same number, but not so for sums of possibly-negative numbers?

Ok, another angle. If you take Christiano’s lottery:

and map outcomes to their utilities, setting the utility of to 1, of to 2, etc., you get:

Looking at how the utility gets rearranged after the “we can write as a mixture” step, the first “1/​2″ term is getting “smeared” across the rest of the terms, giving:

which is a sequence of utilities that are pairwise higher. This is an essential part of the violation of Antisymmetry/​Unbounded/​Dominance. My intuition says that a strange thing happened when you rearranged the terms of the lottery, and maybe you shouldn’t do that.

Should there be another property, called “Rearrangement”?

Rearrangement: you may apply an infinite number of commutivity () and associativity () rewrites to a lottery.

(In contrast, I’m pretty sure you can’t get an Antisymmetry/​Unbounded/​Dominance violation by applying only finitely many commutivity and associativity rearrangements.)

I don’t actually have a sense of what “infinite lotteries, considered equivalent up to finite but not infinite rearrangements” look like. Maybe it’s not a sensible thing.

• Here’s a concrete example. Start with a sum that converges to 0 (in fact every partial sum is 0):

0 + 0 + …

Regroup the terms a bit:

= (1 + −1) + (1 + −1) + …

= 1 + (-1 + 1) + (-1 + 1) + …

= 1 + 0 + 0 + …

and you get a sum that converges to 1 (in fact every partial sum is 1). I realize that the things you’re summing are probability distributions over outcomes and not real numbers, but do you have reason to believe that they’re better behaved than real numbers in infinite sums? I’m not immediately seeing how countable additivity helps. Sorry if that should be obvious.