Former AI safety research engineer, now AI governance researcher at OpenAI. Blog: thinkingcomplete.com
Richard_Ngo
But it wasn’t a cancellation attempt. The issue at hand is whether a policy of doxxing influential people is a good idea. The benefits are transparency about who is influencing society, and in which ways; the harms include the ones you’ve listed above, about chilling effects.
It’s hard to weigh these against each other, but one way you might do so is by following a policy like “doxx people only if they’re influential enough that they’re probably robust to things like losing their job”. The correlation between “influential enough to be newsworthy” and “has many options open to them” isn’t perfect, but it’s strong enough that this policy seems pretty reasonable to me.
To flip this around, let’s consider individuals who are quietly influential in other spheres. For example, I expect there are people who many news editors listen to, when deciding how their editorial policies should work. I expect there are people who many Democrat/Republican staffers listen to, when considering how to shape policy. In general I think transparency about these people would be pretty good for the world. If those people happened to have day jobs which would suffer from that transparency, I would say “Look, you chose to have a bunch of influence, which the world should know about, and I expect you can leverage this influence to end up in a good position somehow even after I run some articles on you. Maybe you’re one of the few highly-influential people for whom this happens to not be true, but it seems like a reasonable policy to assume that if someone is actually pretty influential then they’ll land on their feet either way.” And the fact that this was true for Scott is some evidence that this would be a reasonable policy.
(I also think that taking someone influential who didn’t previously have a public profile, and giving them a public profile under their real name, is structurally pretty analogous to doxxing. Many of the costs are the same. In both cases one of the key benefits is allowing people to cross-reference information about that person to get a better picture of who is influencing the world, and how.)
I don’t think the NYT thing played much of a role in Scott being better off now. My guess is a small minority of people are subscribed to his Substack because of the NYT thing (the dominant factor is clearly the popularity of his writing).
What credence do you have that he would have started the substack at all without the NYT thing? I don’t have much information, but probably less than 80%. The timing sure seems pretty suggestive.
(I’m also curious about the likelihood that he would have started his startup without the NYT thing, but that’s less relevant since I don’t know whether the startup is actually going well.)
My guess is the NYT thing hurt him quite a bit and made the potential consequences of him saying controversial things a lot worse for him.
Presumably this is true of most previously-low-profile people that the NYT chooses to write about in not-maximally-positive ways, so it’s not a reasonable standard to hold them to. And so as a general rule I do think “the amount of adversity that you get when you used to be an influential yet unknown person but suddenly get a single media feature about you” is actually fine to inflict on people. In fact, I’d expect that many (or even most) people in this category will have a worse time of it than Scott—e.g. because they do things that are more politically controversial than Scott, have fewer avenues to make money, etc.
I mean, Scott seems to be in a pretty good situation now, in many ways better than before.
And yes, this is consistent with NYT hurting him in expectation.
But one difference between doxxing normal people versus doxxing “influential people” is that influential people typically have enough power to land on their feet when e.g. they lose a job. And so the fact that this has worked out well for Scott (and, seemingly, better than he expected) is some evidence that the NYT was better-calibrated about how influential Scott is than he was.
This seems like an example of the very very prevalent effect that Scott wrote about in “against bravery debates”, where everyone thinks their group is less powerful than they actually are. I don’t think there’s a widely-accepted name for it; I sometimes use underdog bias. My main diagnosis of the NYT/SSC incident is that rationalists were caught up by underdog bias, even as they leveraged thousands of influential tech people to attack the NYT.
Since there’s been some recent discussion of the SSC/NYT incident (in particular via Zack’s post), it seems worth copying over my twitter threads from that time about why I was disappointed by the rationalist community’s response to the situation.
I continue to stand by everything I said below.
Thread 1 (6/23/20):
Scott Alexander is the most politically charitable person I know. Him being driven off the internet is terrible. Separately, it is also terrible if we have totally failed to internalize his lessons, and immediately leap to the conclusion that the NYT is being evil or selfish.
Ours is a community built around the long-term value of telling the truth. Are we unable to imagine reasonable disagreement about when the benefits of revealing real names outweigh the harms? Yes, it goes against our norms, but different groups have different norms.
If the extended rationalist/SSC community could cancel the NYT, would we? For planning to doxx Scott? For actually doing so, as a dumb mistake? For doing so, but for principled reasons? Would we give those reasons fair hearing? From what I’ve seen so far, I suspect not.
I feel very sorry for Scott, and really hope the NYT doesn’t doxx him or anyone else. But if you claim to be charitable and openminded, except when confronted by a test that affects your own community, then you’re using those words as performative weapons, deliberately or not.
[One more tweet responding to tweets by @balajis and @webdevmason, omitted here.]
Thread 2 (1/21/21):
Scott Alexander is writing again, on a substack blog called Astral Codex Ten! Also, he doxxed himself in the first post. This post seems like solid evidence that many SSC fans dramatically overreacted to the NYT situation.
Scott: “I still think the most likely explanation for what happened was that there was a rule on the books, some departments and editors followed it more slavishly than others, and I had the bad luck to be assigned to a department and editor that followed it a lot. That’s all.” [I didn’t comment on this in the thread, but I intended to highlight the difference between this and the conspiratorial rhetoric that was floating around when he originally took his blog down.]
I am pretty unimpressed by his self-justification: “Suppose Power comes up to you and says hey, I’m gonna kick you in the balls. … Sometimes you have to be a crazy bastard so people won’t walk all over you.” Why is doxxing the one thing Scott won’t be charitable about?[In response to @habryka asking what it would mean for Scott to be charitable about this]: Merely to continue applying the standards of most of his other posts, where he assumes both sides are reasonable and have useful perspectives. And not to turn this into a bravery debate.
[In response to @benskuhn saying that Scott’s response is understandable, since being doxxed nearly prevented him from going into medicine]: On one hand, yes, this seems reasonable. On the other hand, this is also a fully general excuse for unreasonable dialogue. It is always the case that important issues have had major impacts on individuals. Taking this excuse seriously undermines Scott’s key principles.
I would be less critical if it were just Scott, but a lot of people jumped on narratives similar to “NYT is going around kicking people in the balls for selfish reasons”, demonstrating an alarming amount of tribalism—and worse, lack of self-awareness about it.
- 26 Mar 2024 20:56 UTC; 14 points) 's comment on My Interview With Cade Metz on His Reporting About Slate Star Codex by (
+1, I agree with all of this, and generally consider the SSC/NYT incident to be an example of the rationalist community being highly tribalist.
(more on this in a twitter thread, which I’ve copied over to LW here)
Very cool work! A couple of (perhaps-silly) questions:
Do these results have any practical implications for prediction markets?
Which of your results rely on there being a fixed pool of experts who have to forecast a question (as opposed to experts being free to pick and choose which questions they forecast)?
Do you know if your arbitrage-free contract function permits types of collusion that don’t leave all experts better off under every outcome, but do make each of them better off in expectation according to their own credences? (I.e. types of collusion that they would agree to in advance.) Apart from just making side bets.
What are the others?
Huh, I’d say the opposite. Green-according-to-black says “fuck all the people who are harming nature”, because black sees the world through an adversarial lens. But actual green is better at getting out of the adversarial/striving mindset.
My favorite section of this post was the “green according to non-green” section, which I felt captured really well the various ways that other colors see past green.
I don’t fully feel like the green part inside me resonated with any of your descriptions of it, though. So let me have a go at describing green, and seeing if that resonates with you.
Green is the idea that you don’t have to strive towards anything. Thinking that green is instrumentally useful towards some other goal misses the whole point of green, which is about getting out of a goal- or action-oriented mindset. When you do that, your perception expands from a tunnel-vision “how can I get what I want” to actually experiencing the world in its unfiltered glory—actually looking at the redwoods. If you do that, then you can’t help but feel awe. And when you step out of your self-oriented tunnel, suddenly the world has far more potential for harmony than you’d previously seen, because in fact the motivations that are causing the disharmony are… illusions, in some sense. Green looks at someone cutting down a redwood and sees someone who is hurting themself, by forcibly shutting off the parts of themselves that are capable of appreciation and awe. Knowing this doesn’t actually save the redwoods, necessarily, but it does make it far easier to be in a state of acceptance, because deep down nobody is actually your enemy.
More thoughts: what’s the difference between paying in a counterfactual mugging based on:
Whether the millionth digit of pi (5) is odd or even
Whether or not there are an infinite number of primes?
In the latter case knowing the truth is (near-)inextrictably entangled with a bunch of other capabilities, like the ability to do advanced mathematics. Whereas in the former it isn’t. Suppose that before you knew either fact you were told that one of them was entangled in this way—would you still want to commit to paying out in a mugging based on it?
Well… maybe? But it means that the counterlogical of “if there hadn’t been an infinite number of primes” is not very well-defined—it’s hard to modify your brain to add that belief without making a bunch of other modifications. So now Omega doesn’t just have to be (near-)omniscient, it also needs to have a clear definition of the counterlogical that’s “fair” according to your standards; without knowing that it has that, paying up becomes less tempting.
Yepp, as in Logical Induction, new traders get spawned over time (in some kind of simplicity-weighted ordering).
Artificial agents can be copied or rolled back (erase memories), which makes it possible to reverse the receipt of information if an assessor concludes with a price that the seller considers too low for a deal.
Yepp, very good point. Am working on a short story about this right now.
Absolutely, wireheading is a real phenomenon, so the question is how can real agents exist that mostly don’t fall to it. And I was asking for a story about how your model can be altered/expanded to make sense of that.
Ah, I see. In that case I think I disagree that it happens “by default” in this model. A few dynamics which prevent it:
If the wealthy trader makes reward easier to get, then the price of actions will go up accordingly (because other traders will notice that they can get a lot of reward by winning actions). So in order for the wealthy trader to keep making money, they need to reward outcomes which only they can achieve, which seems a lot harder.
I don’t yet know how traders would best aggregate votes into a reward function, but it should be something which has diminishing marginal return to spending, i.e. you can’t just spend 100x as much to get 100x higher reward on your preferred outcome. (Maybe quadratic voting?)
Other traders will still make money by predicting sensory observations. Now, perhaps the wealthy trader could avoid this by making observations as predictable as possible (e.g. going into a dark room where nothing happens—kinda like depression, maybe?) But this outcome would be assigned very low reward by most other traders, so it only works once a single trader already has a large proportion of the wealth.
Yep, that’s why I believe “in the limit your traders will already do this”. I just think it will be a dominant dynamic of efficient agents in the real world, so it’s better to represent it explicitly
IMO the best way to explicitly represent this is via a bias towards simpler traders, who will in general pay attention to fewer things.
But actually I don’t think that this is a “dominant dynamic” because in fact we have a strong tendency to try to pull different ideas and beliefs together into a small set of worldviews. And so even if you start off with simple traders who pay attention to fewer things, you’ll end up with these big worldviews that have opinions on everything. (These are what I call frames here.)
Yep, but you can just treat it as another observation channel into UDT.
Hmm, I’m confused by this. Why should we treat it this way? There’s no actual observation channel, and in order to derive information about utilities from our experiences, we need to specify some value learning algorithm. That’s the role V is playing.
It’s just that, when we do that, something feels off (to us humans, maybe due to risk-aversion), and we go “hmm, probably this framework is not modelling everything we want, or missing some important robustness considerations, or whatever, because I don’t really feel like spending all my resources and creating a lot of disvalue just because in the world where 1 + 1 = 3 someone is offering me a good deal”.
Obviously I am not arguing that you should agree to all moral muggings. If a pain-maximizer came up to you and said “hey, looks like we’re in a world where pain is way easier to create than pleasure, give me all your resources”, it would be nuts to agree, just like it would be nuts to get mugged by “1+1=3″. I’m just saying that “sometimes you get mugged” is not a good argument against my position, and definitely doesn’t imply “you get mugged everywhere”.
I think real learning has some kind of ground-truth reward.
I’d actually represent this as “subsidizing” some traders. For example, humans have a social-status-detector which is hardwired to our reward systems. One way to implement this is just by taking a trader which is focused on social status and giving it a bunch of money. I think this is also realistic in the sense that our human hardcoded rewards can be seen as (fairly dumb) subagents.
I think this will by default lead to wireheading (a trader becomes wealthy and then sets reward to be very easy for it to get and then keeps getting it), and you’ll need a modification of this framework which explains why that’s not the case.
I think this happens in humans—e.g. we fall into cults, we then look for evidence that the cult is correct, etc etc. So I don’t think this is actually a problem that should be ruled out—it’s more a question of how you tweak the parameters to make this as unlikely as possible. (One reason it can’t be ruled out: it’s always possible for an agent to end up in a belief state where it expects that exploration will be very severely punished, which drives the probability of exploration arbitrarily low.)
they notice that topic A and topic B are unrelated enough, so you can have the traders thinking about these topics be pretty much separate, and you don’t lose much, and you waste less compute
I’m assuming that traders can choose to ignore whichever inputs/topics they like, though. They don’t need to make trades on everything if they don’t want to.
I do feel like real implementations of these mechanisms will need to have pretty different, way-more-local structure to be efficient at all
Yeah, this is why I’m interested in understanding how sub-markets can be aggregated into markets, sub-auctions into auctions, sub-elections into elections, etc.
Also, you can get rid of this problem by saying “you just want to maximize the variable U”. And the things you actually care about (dogs, apples) are just “instrumentally” useful in giving you U.
But you need some mechanism for actually updating your beliefs about U, because you can’t empirically observe U. That’s the role of V.
leads to getting Pascal’s mugged by the world in which you care a lot about easy things
I think this is fine. Consider two worlds:
In world L, lollipops are easy to make, and paperclips are hard to make.
In world P, it’s the reverse.
Suppose you’re a paperclip-maximizer in world L. And a lollipop-maximizer comes up to you and says “hey, before I found out whether we were in L or P, I committed to giving all my resources to paperclip-maximizers if we were in P, as long as they gave me all their resources if we were in L. Pay up.”
UDT says to pay here—but that seems basically equivalent to getting “mugged” by worlds where you care about easy things.
Some more thoughts: we can portray the process of choosing a successor policy as the iterative process of making more and more commitments over time. But what does it actually look like to make a commitment? Well, consider an agent that is made of multiple subagents, that each get to vote on its decisions. You can think of a commitment as basically saying “this subagent still gets to vote, but no longer gets updated”—i.e. it’s a kind of stop-gradient.
Two interesting implications of this perspective:
The “cost” of a commitment can be measured both in terms of “how often does the subagent vote in stupid ways?”, and also “how much space does it require to continue storing this subagent?” But since we’re assuming that agents get much smarter over time, probably the latter is pretty small.
There’s a striking similarity to the problem of trapped priors in human psychology. Parts of our brains basically are subagents that still get to vote but no longer get updated. And I don’t think this is just a bug—it’s also a feature. This is true on the level of biological evolution (you need to have a strong fear of death in order to actually survive) and also on the level of cultural evolution (if you can indoctrinate kids in a way that sticks, then your culture is much more likely to persist).
The (somewhat provocative) way of phrasing this is that trauma is evolution’s approach to implementing UDT. Someone who’s been traumatized into conformity by society when they were young will then (in theory) continue obeying society’s dictates even when they later have more options. Someone who gets very angry if mistreated in a certain way is much harder to mistreat in that way. And of course trauma is deeply suboptimal in a bunch of ways, but so too are UDT commitments, because they were made too early to figure out better alternatives.
This is clearly only a small component of the story but the analogy is definitely a very interesting one.
UDT specifically enables agents to consider the updated-away possibilities in a way relevant to decision making, while an updated agent (that’s not using something UDT-like) wouldn’t be able to do that in any circumstance
Agreed; apologies for the sloppy phrasing.
Historically it was overwhelmingly the frame until recently, so it’s the correct frame for interpreting the intended meaning of texts from that time.
I agree, that’s why I’m trying to outline an alternative frame for thinking about it.
Here is the best toy model I currently have for rational agents. Alas, it is super messy and hacky, but better than nothing. I’ll call it the BAVM model; the one-sentence summary is “internal traders concurrently bet on beliefs, auction actions, vote on values, and merge minds”. There’s little novel here, I’m just throwing together a bunch of ideas from other people (especially Scott Garrabrant and Abram Demski).
In more detail, the three main components are:
A prediction market
An action auction
A value election
You also have some set of traders, who can simultaneously trade on any combination of these three. Traders earn money in two ways:
Making accurate predictions about future sensory experiences on the market.
Taking actions which lead to reward or increase the agent’s expected future value.
They spend money in three ways:
Bidding to control the agent’s actions for the next N timesteps.
Voting on what actions get reward and what states are assigned value.
Running the computations required to figure out all these trades.
Values are therefore dominated by whichever traders earn money from predictions or actions, who will disproportionately vote for values that are formulated in the same ontologies they use for prediction/action, since that’s simpler than using different ontologies. (Note that this does requires the assumption that simpler traders start off with more money.)
The last component is that it costs traders money to do computation. The way they can reduce this is by finding other traders who do similar computations as them, and then merging into a single trader. I am very interested in better understanding what a merging process like this might look like, though it seems pretty intractable in general because it will depend a lot on the internal details of the traders. (So perhaps a more principled approach here is to instead work top-down, figuring out what sub-markets or sub-auctions look like?)
In general having someone’s actual name public makes it much easier to find out other public information attached to them. E.g. imagine if Scott were involved in shady business dealings under his real name. This is the sort of thing that the NYT wouldn’t necessarily discover just by writing the profile of him, but other people could subsequently discover after he was doxxed.
To be clear, btw, I’m not arguing that this doxxing policy is correct, all things considered. Personally I think the benefits of pseudonymity for a healthy ecosystem outweigh the public value of transparency about real names. I’m just arguing that there are policies consistent with the NYT’s actions which are fairly reasonable.