Former AI safety research engineer, now AI governance researcher at OpenAI. Blog: thinkingcomplete.com
Richard_Ngo
This is particularly weird because your indexical probability then depends on what kind of bet you’re offered. In other words, our marginal utility of money differs from our marginal utility of other things, and which one do you use to set your indexical probability? So this seems like a non-starter to me...
It seems pretty weird to me too, but to steelman: why shouldn’t it depend on the type of bet you’re offered? Your indexical probabilities can depend on any other type of observation you have when you open your eyes. E.g. maybe you see blue carpets, and you know that world A is 2x more likely to have blue carpets. And hearing someone say “and the bet is denominated in money not time” could maybe update you in an analogous way.
I mostly offer this in the spirit of “here’s the only way I can see to reconcile subjective anticipation with UDT at all”, not “here’s something which makes any sense mechanistically or which I can justify on intuitive grounds”.
My own interpretation of how UDT deals with anthropics (and I’m assuming ADT is similar) is “Don’t think about indexical probabilities or subjective anticipation. Just think about measures of things you (considered as an algorithm with certain inputs) have influence over.”
(Speculative paragraph, quite plausibly this is just nonsense.) Suppose you have copies A and B who are both offered the same bet on whether they’re A. One way you could make this decision is to assign measure to A and B, then figure out what the marginal utility of money is for each of A and B, then maximize measure-weighted utility. Another way you could make this decision, though, is just to say “the indexical probability I assign to ending up as each of A and B is proportional to their marginal utility of money” and then maximize your expected money. Intuitively this feels super weird and unjustified, but it does make the “prediction” that we’d find ourselves in a place with high marginal utility of money, as we currently do.
(Of course “money” is not crucial here, you could have the same bet with “time” or any other resource that can be compared across worlds.)
I would say that under UDASSA, it’s perhaps not super surprising to be when/where we are, because this seems likely to be a highly simulated time/scenario for a number of reasons (curiosity about ancestors, acausal games, getting philosophical ideas from other civilizations).
Fair point. By “acausal games” do you mean a generalization of acausal trade? (Acausal trade is the main reason I’d expect us to be simulated a lot.)
I don’t actually think proponents of anti-x-risk AI regulation have thought very much about the ways in which regulatory capture might in fact be harmful to reducing AI x-risk. At least, I haven’t seen much writing about this, nor has it come up in many of the discussions I’ve had (except insofar as I brought it up).
In general I am against arguments of the form “X is terrible but we have to try it because worlds that don’t do it are even more doomed”. I’ll steal Scott Garrabrant’s quote from here:
“If you think everything is doomed, you should try not to mess anything up. If your worldview is right, we probably lose, so our best out is the one where your your worldview is somehow wrong. In that world, we don’t want mistaken people to take big unilateral risk-seeking actions.
Until recently, people with P(doom) of, say, 10%, have been natural allies of people with P(doom) of >80%. But the regulation that the latter group thinks is sufficient to avoid xrisk with high confidence has, on my worldview, a significant chance of either causing x-risk from totalitarianism, or else causing x-risk via governments being worse at alignment than companies would have been. How high? Not sure, but plausibly enough to make these two groups no longer natural allies.
A tension that keeps recurring when I think about philosophy is between the “view from nowhere” and the “view from somewhere”, i.e. a third-person versus first-person perspective—especially when thinking about anthropics.
One version of the view from nowhere says that there’s some “objective” way of assigning measure to universes (or people within those universes, or person-moments). You should expect to end up in different possible situations in proportion to how much measure your instances in those situations have. For example, UDASSA ascribes measure based on the simplicity of the computation that outputs your experience.
One version of the view from somewhere says that the way you assign measure across different instances should depend on your values. You should act as if you expect to end up in different possible future situations in proportion to how much power to implement your values the instances in each of those situations has. I’ll call this the ADT approach, because that seems like the core insight of Anthropic Decision Theory. Wei Dai also discusses it here.In some sense each of these views makes a prediction. UDASSA predicts that we live in a universe with laws of physics that are very simple to specify (even if they’re computationally expensive to run), which seems to be true. Meanwhile the ADT approach “predicts” that we find ourselves at an unusually pivotal point in history, which also seems true.
Intuitively I want to say “yeah, but if I keep predicting that I will end up in more and more pivotal places, eventually that will be falsified”. But.… on a personal level, this hasn’t actually been falsified yet. And more generally, acting on those predictions can still be positive in expectation even if they almost surely end up being falsified. It’s a St Petersburg paradox, basically.
Very speculatively, then, maybe a way to reconcile the view from somewhere and the view from nowhere is via something like geometric rationality, which avoids St Petersburg paradoxes. And more generally, it feels like there’s some kind of multi-agent perspective which says I shouldn’t model all these copies of myself as acting in unison, but rather as optimizing for some compromise between all their different goals (which can differ even if they’re identical, because of indexicality). No strong conclusions here but I want to keep playing around with some of these ideas (which were inspired by a call with @zhukeepa).
This was all kinda rambly but I think I can summarize it as “Isn’t it weird that ADT tells us that we should act as if we’ll end up in unusually important places, and also we do seem to be in an incredibly unusually important place in the universe? I don’t have a story for why these things are related but it does seem like a suspicious coincidence.”
Suppose we replace “unconditional love” with “unconditional promise”. E.g. suppose Alice has promised Bob that she’ll make Bob dinner on Christmas no matter what. Now it would be clearly confused to say “Alice promised Bob Christmas dinner unconditionally, so presumably she promised everything else Christmas dinner as well, since it is only conditions that separate Bob from the worms”.
What’s gone wrong here? Well, the ontology humans use for coordinating with each other assumes the existence of persistent agents, and so when you say you unconditionally promise/love/etc a given agent, then this implicitly assumes that we have a way of deciding which agents are “the same agent”. No theory of personal identity is fully philosophically robust, of course, but if you object to that then you need to object not only to “I unconditionally love you” but also any sentence which contains the word “you”, since we don’t have a complete theory of what that refers to.
A woman who leaves a man because he grew plump and a woman who leaves a man because he committed treason both possessed ‘conditional love’.
This is not necessarily conditional love, this is conditional care or conditional fidelity. You can love someone and still leave them; they don’t have to outweigh everything else you care about.
But also: I think “I love you unconditionally” is best interpreted as a report of your current state, rather than a commitment to maintaining that state indefinitely.
The thing that distinguishes the coin case from the wind case is how hard it is to gather additional information, not how much more information could be gathered in principle. In theory you could run all sorts of simulations that would give you informative data about an individual flip of the coin, it’s just that it would be really hard to do so/very few people are able to do so. I don’t think the entropy of the posterior captures this dynamic.
The variance over time depends on how you gather information in the future, making it less general. For example, I may literally never learn enough about meteorology to update my credence about the winds from 0.5. Nevertheless, there’s still an important sense in which this credence is more fragile than my credence about coins, because I could update it.
I guess you could define it as something like “the variance if you investigated it further”. But defining what it means to investigate further seems about as complicated as defining the reference class of people you’re trading against. Also variance doesn’t give you the same directional information—e.g. OP would bet on doom at 2% or bet against it at 16%.
Overall though, as I said above, I don’t know a great way to formalize this, and would be very interested in attempts to do so.
I don’t think there’s a very good precise way to do so, but one useful concept is bid-ask spreads, which are a way of protecting yourself from adverse selection of bets. E.g. consider the following two credences, both of which are 0.5.
My credence that a fair coin will land heads.
My credence that the wind tomorrow in my neighborhood will be blowing more northwards than southwards (I know very little about meteorology and have no recollection of which direction previous winds have mostly blown).
Intuitively, however, the former is very difficult to change, whereas the latter might swing wildly given even a little bit of evidence (e.g. someone saying “I remember in high school my teacher mentioned that winds often blow towards the equator.”)
Suppose I have to decide on a policy that I’ll accept bets for or against each of these propositions at X:1 odds (i.e. my opponent puts up $X for every $1 I put up). For the first proposition, I might set X to be 1.05, because as long as I have a small edge I’m confident I won’t be exploited.
By contrast, if I set X=1.05 for the second proposition, then probably what will happen is that people will only decide to bet against me if they have more information than me (e.g. checking weather forecasts), and so they’ll end up winning a lot of money for me. And so I’d actually want X to be something more like 2 or maybe higher, depending on who I expect to be betting against, even though my credence right now is 0.5.
In your case, you might formalize this by talking about your bid-ask spread when trading against people who know about these bottlenecks.
Tinker
I think the two things that felt most unhealthy were:
The “no forgiveness is ever possible” thing, as you highlight. Almost all talk about ineradicable sin should, IMO, be seen as a powerful psychological attack.
The “our sins” thing feels like an unhealthy form of collective responsibility—you’re responsible even if you haven’t done anything. Again, very suspect on priors.
Maybe this is more intuitive for rationalists if you imagine a SJW writing a song about how, even millions of years in the future, anyone descended from westerners should still feel guilt about slavery: “Our sins can never be undone. No single death will be forgiven.” I think this is the psychological exploit that’s screwed up leftism so much over the last decade, and feels very analogous to what’s happening in this song.
Just read this (though not too carefully). The book is structured with about half being transcripts of fictional lectures given by Bostrom at Oxford, about a quarter being stories about various woodland creatures striving to build a utopia, and another quarter being various other vignettes and framing stories.
Overall, I was a bit disappointed. The lecture transcripts touch on some interesting ideas, but Bostrom’s style is generally one which tries to classify and taxonimize, rather than characterize (e.g. he has a long section trying to analyze the nature of boredom). I think this doesn’t work very well when describing possible utopias, because they’ll be so different from today that it’s hard to extrapolate many of our concepts to that point, and also because the hard part is making it viscerally compelling.
The stories and vignettes are somewhat esoteric; it’s hard to extract straightforward lessons from them. My favorite was a story called The Exaltation of ThermoRex, about an industrialist who left his fortune to the benefit of his portable room heater, leading to a group of trustees spending many millions of dollars trying to figure out (and implement) what it means to “benefit” a room heater.
Just read Bostrom’s Deep Utopia (though not too carefully). The book is structured with about half being transcripts of fictional lectures given by Bostrom at Oxford, about a quarter being stories about various woodland creatures striving to build a utopia, and another quarter being various other vignettes and framing stories.
Overall, I was a bit disappointed. The lecture transcripts touch on some interesting ideas, but Bostrom’s style is generally one which tries to classify and taxonimize, rather than characterize (e.g. he has a long section trying to analyze the nature of boredom). I think this doesn’t work very well when describing possible utopias, because they’ll be so different from today that it’s hard to extrapolate many of our concepts to that point, and also because the hard part is making it viscerally compelling.
The stories and vignettes are somewhat esoteric; it’s hard to extract straightforward lessons from them. My favorite was a story called The Exaltation of ThermoRex, about an industrialist who left his fortune to the benefit of his portable room heater, leading to a group of trustees spending many millions of dollars trying to figure out (and implement) what it means to “benefit” a room heater.
Fantastic work :)
Some thoughts on the songs:
I’m overall super impressed by how well the styles of the songs fit the content—e.g. the violins in FHI, the British accent works really well for More Dakka, the whisper for We Do Not Wish, the Litany of Tarrrrski, etc.
My favorites to listen to are FHI at Oxford, Nihil Supernum, and Litany of Tarrrrski, because they have both messages that resonate a lot and great tunes.
IMO Answer to Job is the best-composed on artistic merits, and will have the most widespread appeal. Tune is great, style matches the lyrics really well (particular shout-out to the “or labor or lust” as a well-composed bar). Only change I’d make is changing “upon lotus thrones” to “on lotus thrones” to scan better.
Dath Ilan’s Song feels… pretty unhealthy, tbh.
I thought Prime Factorization was really great until the bit about the car and the number, which felt a bit jarring.
If it was the case that there was important public information attached to Scott’s full name, then this argument would make sense to me.
In general having someone’s actual name public makes it much easier to find out other public information attached to them. E.g. imagine if Scott were involved in shady business dealings under his real name. This is the sort of thing that the NYT wouldn’t necessarily discover just by writing the profile of him, but other people could subsequently discover after he was doxxed.
To be clear, btw, I’m not arguing that this doxxing policy is correct, all things considered. Personally I think the benefits of pseudonymity for a healthy ecosystem outweigh the public value of transparency about real names. I’m just arguing that there are policies consistent with the NYT’s actions which are fairly reasonable.
But it wasn’t a cancellation attempt. The issue at hand is whether a policy of doxxing influential people is a good idea. The benefits are transparency about who is influencing society, and in which ways; the harms include the ones you’ve listed above, about chilling effects.
It’s hard to weigh these against each other, but one way you might do so is by following a policy like “doxx people only if they’re influential enough that they’re probably robust to things like losing their job”. The correlation between “influential enough to be newsworthy” and “has many options open to them” isn’t perfect, but it’s strong enough that this policy seems pretty reasonable to me.
To flip this around, let’s consider individuals who are quietly influential in other spheres. For example, I expect there are people who many news editors listen to, when deciding how their editorial policies should work. I expect there are people who many Democrat/Republican staffers listen to, when considering how to shape policy. In general I think transparency about these people would be pretty good for the world. If those people happened to have day jobs which would suffer from that transparency, I would say “Look, you chose to have a bunch of influence, which the world should know about, and I expect you can leverage this influence to end up in a good position somehow even after I run some articles on you. Maybe you’re one of the few highly-influential people for whom this happens to not be true, but it seems like a reasonable policy to assume that if someone is actually pretty influential then they’ll land on their feet either way.” And the fact that this was true for Scott is some evidence that this would be a reasonable policy.
(I also think that taking someone influential who didn’t previously have a public profile, and giving them a public profile under their real name, is structurally pretty analogous to doxxing. Many of the costs are the same. In both cases one of the key benefits is allowing people to cross-reference information about that person to get a better picture of who is influencing the world, and how.)
I don’t think the NYT thing played much of a role in Scott being better off now. My guess is a small minority of people are subscribed to his Substack because of the NYT thing (the dominant factor is clearly the popularity of his writing).
What credence do you have that he would have started the substack at all without the NYT thing? I don’t have much information, but probably less than 80%. The timing sure seems pretty suggestive.
(I’m also curious about the likelihood that he would have started his startup without the NYT thing, but that’s less relevant since I don’t know whether the startup is actually going well.)
My guess is the NYT thing hurt him quite a bit and made the potential consequences of him saying controversial things a lot worse for him.
Presumably this is true of most previously-low-profile people that the NYT chooses to write about in not-maximally-positive ways, so it’s not a reasonable standard to hold them to. And so as a general rule I do think “the amount of adversity that you get when you used to be an influential yet unknown person but suddenly get a single media feature about you” is actually fine to inflict on people. In fact, I’d expect that many (or even most) people in this category will have a worse time of it than Scott—e.g. because they do things that are more politically controversial than Scott, have fewer avenues to make money, etc.
I mean, Scott seems to be in a pretty good situation now, in many ways better than before.
And yes, this is consistent with NYT hurting him in expectation.
But one difference between doxxing normal people versus doxxing “influential people” is that influential people typically have enough power to land on their feet when e.g. they lose a job. And so the fact that this has worked out well for Scott (and, seemingly, better than he expected) is some evidence that the NYT was better-calibrated about how influential Scott is than he was.
This seems like an example of the very very prevalent effect that Scott wrote about in “against bravery debates”, where everyone thinks their group is less powerful than they actually are. I don’t think there’s a widely-accepted name for it; I sometimes use underdog bias. My main diagnosis of the NYT/SSC incident is that rationalists were caught up by underdog bias, even as they leveraged thousands of influential tech people to attack the NYT.
Since there’s been some recent discussion of the SSC/NYT incident (in particular via Zack’s post), it seems worth copying over my twitter threads from that time about why I was disappointed by the rationalist community’s response to the situation.
I continue to stand by everything I said below.
Thread 1 (6/23/20):
Scott Alexander is the most politically charitable person I know. Him being driven off the internet is terrible. Separately, it is also terrible if we have totally failed to internalize his lessons, and immediately leap to the conclusion that the NYT is being evil or selfish.
Ours is a community built around the long-term value of telling the truth. Are we unable to imagine reasonable disagreement about when the benefits of revealing real names outweigh the harms? Yes, it goes against our norms, but different groups have different norms.
If the extended rationalist/SSC community could cancel the NYT, would we? For planning to doxx Scott? For actually doing so, as a dumb mistake? For doing so, but for principled reasons? Would we give those reasons fair hearing? From what I’ve seen so far, I suspect not.
I feel very sorry for Scott, and really hope the NYT doesn’t doxx him or anyone else. But if you claim to be charitable and openminded, except when confronted by a test that affects your own community, then you’re using those words as performative weapons, deliberately or not.
[One more tweet responding to tweets by @balajis and @webdevmason, omitted here.]
Thread 2 (1/21/21):
Scott Alexander is writing again, on a substack blog called Astral Codex Ten! Also, he doxxed himself in the first post. This post seems like solid evidence that many SSC fans dramatically overreacted to the NYT situation.
Scott: “I still think the most likely explanation for what happened was that there was a rule on the books, some departments and editors followed it more slavishly than others, and I had the bad luck to be assigned to a department and editor that followed it a lot. That’s all.” [I didn’t comment on this in the thread, but I intended to highlight the difference between this and the conspiratorial rhetoric that was floating around when he originally took his blog down.]
I am pretty unimpressed by his self-justification: “Suppose Power comes up to you and says hey, I’m gonna kick you in the balls. … Sometimes you have to be a crazy bastard so people won’t walk all over you.” Why is doxxing the one thing Scott won’t be charitable about?[In response to @habryka asking what it would mean for Scott to be charitable about this]: Merely to continue applying the standards of most of his other posts, where he assumes both sides are reasonable and have useful perspectives. And not to turn this into a bravery debate.
[In response to @benskuhn saying that Scott’s response is understandable, since being doxxed nearly prevented him from going into medicine]: On one hand, yes, this seems reasonable. On the other hand, this is also a fully general excuse for unreasonable dialogue. It is always the case that important issues have had major impacts on individuals. Taking this excuse seriously undermines Scott’s key principles.
I would be less critical if it were just Scott, but a lot of people jumped on narratives similar to “NYT is going around kicking people in the balls for selfish reasons”, demonstrating an alarming amount of tribalism—and worse, lack of self-awareness about it.
- 26 Mar 2024 20:56 UTC; 19 points) 's comment on My Interview With Cade Metz on His Reporting About Slate Star Codex by (
+1, I agree with all of this, and generally consider the SSC/NYT incident to be an example of the rationalist community being highly tribalist.
(more on this in a twitter thread, which I’ve copied over to LW here)
I don’t disagree with this; when I say “thought very much” I mean e.g. to the point of writing papers about it, or even blog posts, or analyzing it in talks, or basically anything more than cursory brainstorming. Maybe I just haven’t seen that stuff, idk.