I’m a 3rd year PhD student at Columbia. My academic interests lie in mechanism design and algorithms related to the acquisition of knowledge. I write a blog on stuff I’m interested in (such as math, philosophy, puzzles, statistics, and elections): https://ericneyman.wordpress.com/
Eric Neyman
(Note: I work with Paul at ARC theory. These views are my own and Paul did not ask me to write this comment.)
I think the following norm of civil discourse is super important: do not accuse someone of acting in bad faith, unless you have really strong evidence. An accusation of bad faith makes it basically impossible to proceed with discussion and seek truth together, because if you’re treating someone’s words as a calculated move in furtherance of their personal agenda, then you can’t take those words at face value.
I believe that this post violates this norm pretty egregiously. It begins by saying that hiding your beliefs “is lying”. I’m pretty confident that the sort of belif-hiding being discussed in the post is not something most people would label “lying” (see Ryan’s comment), and it definitely isn’t a central example of lying. (And so in effect it labels a particular behavior “lying” in an attempt to associate it with behaviors generally considered worse.)
The post then confidently asserts that Paul Christiano hides his beliefs in order to promote RSPs. This post presents very little evidence presented that this is what’s going on, and Paul’s account seems consistent with the facts (and I believe him).
So in effect, it accuses Paul and others of lying, cowardice, and bad faith on what I consider to be very little evidence.
Edited to add: What should the authors have done instead? I think they should have engaged in a public dialogue with one or more of the people they call out / believe to be acting dishonestly. The first line of the dialogue should maybe have been: “I believe you have been hiding your beliefs, for [reasons]. I think this is really bad, for [reasons]. I’d like to hear your perspective.”
Hi! I just wanted to mention that I really appreciate this sequence. I’ve been having lots of related thoughts, and it’s great to see a solid theoretical grounding for them. I find the notion that bargaining can happen across lots of different domains—different people or subagents, different states of the world, maybe different epistemic states—particularly useful. And this particular post presents the only argument for rejecting a VNM axiom I’ve ever found compelling. I think there’s a decent chance that this sequence will become really foundational to my thinking.
I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I’d be interested in hearing people’s answers to this question. Or, if you want more specific questions:
By your values, do you think a misaligned AI creates a world that “rounds to zero”, or still has substantial positive value?
A common story for why aligned AI goes well goes something like: “If we (i.e. humanity) align AI, we can and will use it to figure out what we should use it for, and then we will use it in that way.” To what extent is aligned AI going well contingent on something like this happening, and how likely do you think it is to happen? Why?
To what extent is your belief that aligned AI would go well contingent on some sort of assumption like: my idealized values are the same as the idealized values of the people or coalition who will control the aligned AI?
Do you care about AI welfare? Does your answer depend on whether the AI is aligned? If we built an aligned AI, how likely is it that we will create a world that treats AI welfare as important consideration? What if we build a misaligned AI?
Do you think that, to a first approximation, most of the possible value of the future happens in worlds that are optimized for something that resembles your current or idealized values? How bad is it to mostly sacrifice each of these? (What if the future world’s values are similar to yours, but is only kinda effectual at pursuing them? What if the world is optimized for something that’s only slightly correlated with your values?) How likely are these various options under an aligned AI future vs. an unaligned AI future?
Puzzle 3 thoughts: I believe I can do it with
1
coins, as follows.
First, I claim that for any prime q, it is possible to choose one of q + 1 outcomes with just one coin. I do this as follows:
Let p be a probability such that (Such a p exists by the intermediate value theorem, since p = 0 gives a value that’s too large and p = 1⁄2 gives a value that’s too small.)
Flip a coin that has probability p of coming up heads exactly q times. If all flips are the same, that corresponds to outcome 1. (This has probability 1/(q + 1) by construction.)
For each k between 1 and q − 1, there are ways of getting exactly k heads out of q flips, all equally likely. Note that this quantity is divisible by q (since none of 1, …, k are divisible by q; this is where we use that q is prime). Thus, we can subdivide the particular sequences of getting k heads out of q flips into q equally-sized classes, for each k. Each class corresponds to an outcome (2 through q + 1). The probability of each of these outcomes is which is what we wanted.
Now, note that 2021*12 − 1 = 24251 is prime. (I found this by guessing and checking.) So do the above for q = 24251. This lets you flip a coin 24251 times to get 24252 equally likely outcomes. Now, since 24252 = 2021*12, just assign 12 of the outcomes to each person. Then each person will have a 1/2021 chance of being selected.
Conjecture (maybe 50% chance of being true?):
If you’re only allowed to use one coin, it is impossible to do this with fewer than 24251 flips in the worst case.
Question:
What if you can only use coins with rational probabilities?
- 31 Dec 2020 18:17 UTC; 6 points) 's comment on 2021 New Year Optimization Puzzles by (
Thanks! I’ve changed the title to “Great minds might not think alike”.
Interestingly, when I asked my Twitter followers, they liked “Alike minds think great”. I think LessWrong might be a different population. So I decided to change the title on LessWrong, but not on my blog.
The aggregation method you suggest is called logarithmic pooling. Another way to phrase it is: take the geometric mean of the odds given by the probability distribution (or the arithmetic mean of the log-odds). There’s a natural way to associate every proper scoring rule (for eliciting probability distributions) with an aggregation method, and logarithmic pooling is the aggregation method that gets associated with the log scoring rule (which Scott wrote about in an earlier post). (Here’s a paper I wrote about this connection: https://arxiv.org/pdf/2102.07081.pdf)
I’m also exited to see where this sequence goes!
Thanks for the post—I’ve been having thoughts in this general direction and found this post helpful. I’m somewhat drawn to geometric rationality because it gives more intuitive answers in thoughts experiments involving low probabilities of extreme outcomes, such as Pascal’s mugging. I also agree with your claim that “humans are evolved to be naturally inclined towards geometric rationality over arithmetic rationality.”
On the other hand, it seems like geometric rationality only makes sense in the context of natural features that cannot take on negative values. Most of the things I might want to maximize (e.g. utility) can be negative. Do you have thoughts on the extent to which we can salvage geometric rationality from this problem?
- 17 Apr 2024 16:20 UTC; 3 points) 's comment on Should we maximize the Geometric Expectation of Utility? by (
Thanks for the feedback. Just so I can get an approximate idea if this is the consensus: could people upvote this comment if you like the title as is (and upvote mingyuan’s comment if you think it should be changed)? Thanks!
Also, if anyone has a good title suggestion, I’d love to hear it!
(Edit: I may have been misinterpreting what you meant by “geometric mean of probabilities.” If you mean “take the geometric mean of probabilities of all events and then scale them proportionally to add to 1″ then I think that’s a pretty good method of aggregating probabilities. The point i make below is that the scaling is important.)
I think taking the geometric mean of odds makes more sense than taking the geometric mean of probabilities, because of an asymmetry arising from how the latter deals with probabilities near 0 versus probabilities near 1.
Concretely, suppose Alice forecasts an 80% chance of rain and Bob forecasts a 99% chance of rain. Those are 4:1 and 99:1 odds respectively, and if you take the geometric mean you’ll get an aggregate 95.2% chance of rain.
Equivalently, Alice and Bob are forecasting a 20% chance and a 1% chance of no rain—i.e. 1:4 and 1:99 odds. Taking the geometric mean of odds gives you a 4.8% chance of no rain—checks out.
Now suppose we instead take a geometric mean of probabilities. The geometric mean of 80% and 99% is roughly 89.0%, so aggregating Alice’s and Bob’s probabilities of rain in this way will give 89.0%.
On the other hand, aggregating Alice’s and Bob’s probabilities of no rain, i. e. taking a geometric mean of 20% and 1%, gives roughly 4.5%.
This means that there’s an inconsistency with this method of aggregation: you get an 89% chance of rain and a 4.5% chance of no rain.
- Aggregating forecasts by 23 Jul 2020 18:04 UTC; 17 points) (
- 3 Sep 2021 9:34 UTC; 4 points) 's comment on When pooling forecasts, use the geometric mean of odds by (EA Forum;
Thanks! Here are some brief responses:
From the high level summary here it sounds like you’re offloading the task of aggregation to the forecasters themselves. It’s odd to me that you’re describing this as arbitrage.
Here’s what I say about this anticipated objection in the thesis:
For many reasons, the expert may wish to make arbitrage impossible. First, the principal may wish to know whether the experts are in agreement: if they are not, for instance, the principal may want to elicit opinions from more experts. If the experts collude to report an aggregate value (as in our example), the principal does not find out whether they originally agreed. Second, even if the principal only seeks to act based on some aggregate of the experts’ opinions, their method of aggregation may be different from the one that experts use to collude. For instance, the principal may have a private opinion on the trustworthiness of each expert and wishes to average the experts’ opinions with corresponding weights. Collusion among the experts denies the principal this opportunity. Third, a principal may wish to track the accuracy of each individual expert (to figure out which experts to trust more in the future, for instance), and collusion makes this impossible. Fourth, the space of collusion strategies that constitute arbitrage is large. In our example above, any report in [0.546, 0.637] would guarantee a profit; and this does not even mention strategies in which experts report different probabilities. As such, the principal may not even be able to recover basic information about the experts’ beliefs from their reports.
For example, when I worked with IARPA on geopolitical forecasting, our forecasters would get financial rewards depending on what percentile they were in relative to other forecasters.
This would indeed be arbitrage-free, but likely not proper: it wouldn’t necessarily incentivize each expert to report their true belief; instead, an expert’s optimal report is going to be some sort of function of the expert’s belief about the joint probability distribution over the experts’ beliefs. (I’m not sure how much this matters in practice—I defer to you on that.)
It’s surprising to me that you could disincentivize forecasters from reporting the aggregate as their individual forecast.
In Chapter 4, we are thinking of experts as having immutable beliefs, rather than beliefs that change upon hearing other experts’ beliefs. Is this a silly model? If you want, you can think of these beliefs as each expert’s belief after talking to the other experts a bunch. In theory(?) the experts’ beliefs should converge (though I’m not actually clear what happens if the experts are computationally bounded); but in practice, experts often don’t converge (see e.g. the FRI adversarial collaboration on AI risk).
It seems to me that under sufficiently pessimistic conditions, there would be no good way to aggregate those two forecasts.
Yup—in my summary I described “robust aggregation” as “finding an aggregation strategy that works as well as possible in the worst case over a broad class of possible information structures.” In fact, you can’t do anything interesting in the worse case over all information structures. The assumption I make in the chapter in order to get interesting results is, roughly, that experts’ information is substitutable rather than complementary (on average over the information structure). The sort of scenario you describe in your example is the type of example where Alice and Bob’s information might be complementary.
Great questions!
I didn’t work directly on prediction markets. The one place that my thesis touches on prediction markets (outside of general background) is in Chapter 5, page 106, where I give an interpretation of QA pooling in terms of a particular kind of prediction market called a cost function market. This is a type of prediction market where participants trade with a centralized market maker, rather than having an order book. QA pooling might have implications in terms of the right way to structure these markets if you want to allow multiple experts to place trades at the same time, without having the market update in between. (Maybe this is useful in blockchain contexts if market prices can only update every time a new block is created? I’m just spitballing; I don’t really understand how blockchains work.)
I think that for most contexts, this question doesn’t quite make sense, because there’s only one question being forecast. The one exception is where I talk about learning weights for experts over the course of multiple questions (in Chapter 5 and especially 6). Since I talk about competing with the best weighted combination of experts in hindsight, the problem doesn’t immediately make sense if some experts don’t answer some questions. However, if you specify a “default thing to do” if some expert doesn’t participate (e.g. take all the other experts’ weights and renormalize them to add to 1), then you can get the question to make sense again. I didn’t explore this, but my guess is that there are some nice generalizations in this direction.
I don’t! This is Question 4.5.2, on page 94 :) Unfortunately, I would conjecture (70%) that no such contract function exists.
Social graces are not only about polite lies but about social decision procedures on maintaining game theoretic equilibria to maintain cooperation favoring payoff structures.
This sounds interesting. For the sake of concreteness, could you give a couple of central examples of this?
For what it’s worth, the top three finishers were three of the four most calibrated contestants! With this many strings, I think being intentionally overconfident as a bad strategy. (I agree it would make sense if there were like 10 or 20 strings.)
To elaborate on my feelings about the truck:
If it is meant as an attack on Paul, then it feels pretty bad/norm-violating to me. I don’t know what general principle I endorse that makes it not okay: maybe something like “don’t attack people in a really public and flashy way unless they’re super high-profile or hold an important public office”? If you’d like I can poke at the feeling more. Seems like some people in the Twitter thread (Alex Lawsen, Neel Nanda) share the feeling.
If I’m wrong and it’s not an attack, I still think they should have gotten Paul’s consent, and I think the fact that it might be interpreted as an attack (by people seeing the truck) is also relevant.
(Obviously, I think the events “this is at least partially an attack on Paul” and “at least one of the authors of this post are connected to Control AI” are positively correlated, since this post is an attack on Paul. My probabilities are roughly 85% and 97%*, respectively.)
*For a broad-ish definition of “connected to”
I don’t particularly see a reason to dox the people behind the truck, though I am not totally sure. My bar against doxxing is pretty high, though I do care about people being held accountable for large scale actions they take.
That’s fair. I think that it would be better for the world if Control AI were not anonymous, and I judge the group negatively for being anonymous. On the other hand, I don’t think I endorse them being doxxed. So perhaps my request to Connor and Gabriel is: please share what connection you have to Control AI, if any, and share what more information you have permission to share.
There were 14 -- but they did so well that it’s unlikely to have been by chance: the p-value is 0.0002 (i.e. the probability of IQ >150 people having gotten such a large percentile conditioned on their true skill levels being distributed like the entire population is only 0.02%).
For personal reasons it made sense for me to calculate the percentage of Londoners who will have COVID this Thursday, the 16th. The number I got was much higher than I intuitively expected: 10%. Please point out any errors you see!
Among specimens collected in London 5 days ago, about 8000 were positive. This is relative to 4000 before the recent rise in cases, suggesting about 4000 are Omicron. Source
Omicron doubles at a rate of 2.5 days in the UK. Source
So among specimens collected Monday, we’d expect ~16k Omicron cases. Among specimens collected Thursday the 16th that should be ~35k.
As a ballpark guess, we might guess that about half of cases are caught, so that’s ~70k.
The typical time period between someone catching COVID and getting tested is 5 days. So the number of Londoners who will catch COVID on Thursday is ~280k, since they’ll typically get tested 5 days (two doublings) after that. That’s about 3% of the population of London.
Omicron grows by a factor of ~1.3 per day, so (3/1.3)% will catch COVID on Wednesday, and so on. The total percentage of Londoners who will have COVID on Thursday is thus ~10% (summing the appropriate geometric series).
Thoughts?
Good point! You might be interested in how I closed off an earlier draft of this post (which makes some points I didn’t make above, but which I think ended up having too high of a rhetoric to insight ratio):
“I don’t endorse tribalism in general, or think it’s a net positive. Tribalism strikes me as a symmetric weapon, equally wieldable by good and evil. This alone would make tribalism net neutral, but in fact tribalism corrupts, turning scouts into soldiers, making people defend their side irrespective of who’s right. And the more tribal a group becomes, the more fiercely they fight. Tribalism is a soldier of Moloch, the god of defecting in prisoner’s dilemmas.
This is somewhat in tension with my earlier claim that my tribalism is a net positive. If I claim that my tribalism is net positive, but tribalism as a whole is net negative, then I’m saying that I’m special. But everyone feels special from the inside, so you’d be right to call me out for claiming that most people who feel that their tribalism is good are wrong, but I happen to be right. I would respond by saying that among people who think carefully about tribalism, many probably have a good relationship with it. I totally understand if you don’t buy that — or if you think that I haven’t thought carefully enough about my tribalism.
But the other thing is, tribalism’s relationship with Moloch isn’t so straightforward. While on the inter-group level it breeds discord, within a tribe it fosters trust and cooperation. An American identity, and a British identity, and a Soviet identity helped fight the Nazis — just as my EA identity helps fight malaria.
So my advice on tribalism might be summarized thus: first, think carefully and critically about who the good guys are. And once you’ve done that — once you’ve joined them — a little tribalism can go a long way. Not a gallon of tribalism — beyond a certain point, sacrificing clear thinking for social cohesion becomes negative even if you’re on the good side — but a teaspoon.”
I’m curious what disagree votes mean here. Are people disagreeing with my first sentence? Or that the particular questions I asked are useful to consider? Or, like, the vibes of the post?
(Edit: I wrote this when the agree-disagree score was −15 or so.)
(Conflict of interest note: I work at ARC, Paul Christiano’s org. Paul did not ask me to write this comment. I first heard about the truck (below) from him, though I later ran into it independently online.)
There is an anonymous group of people called Control AI, whose goal is to convince people to be against responsible scaling policies because they insufficiently constraint AI labs’ actions. See their Twitter account and website (
also anonymousEdit: now identifies Andrea Miotti of Conjecture as the director). (I first ran into Control AI via this tweet, which uses color-distorting visual effects to portray Anthropic CEO Dario Amodei in an unflattering light, in a way that’s reminiscent of political attack ads.)Control AI has rented a truck that had been circling London’s Parliament Square. The truck plays a video of “Dr. Paul Christiano (Made ChatGPT Possible; Government AI adviser)” saying that there’s a 10-20% chance of an AI takeover and an overall 50% chance of doom, and of Sam Altman saying that the “bad case” of AGI is “lights out for all of us”. The back of the truck says “Responsible Scaling: No checks, No limits, No control”. The video of Paul seems to me to be an attack on Paul (but see Twitter discussion here).
I currently strongly believe that the authors of this post are either in part responsible for Control AI, or at least have been working with or in contact with Control AI. That’s because of the focus on RSPs and because both Connor Leahy and Gabriel Alfour have retweeted Control AI (which has a relatively small following).
Connor/Gabriel—if you are connected with Control AI, I think it’s important to make this clear, for a few reasons. First, if you’re trying to drive policy change, people should know who you are, at minimum so they can engage with you. Second, I think this is particularly true if the policy campaign involves attacks on people who disagree with you. And third, because I think it’s useful context for understanding this post.
Could you clarify if you have any connection (even informal) with Control AI? If you are affiliated with them, could you describe how you’re affiliated and who else is involved?
EDIT: This Guardian article confirms that Connor is (among others) responsible for Control AI.