Blogs at thebsdetector.substack.com and news.manifold.markets
Other things at benshindel.com
Blogs at thebsdetector.substack.com and news.manifold.markets
Other things at benshindel.com
I guess I didn’t bother to make a case for why something like Wikipedia provides value because I thought it was pretty obviously providing such high levels of societal surplus through its existence that the specific amount clearly dwarfs the costs of running it, irrespective of Wikimedia foundation stuff (not an argument for why one should donate money to it now but rather why whatever money it initially took to get it up and running was clearly very well spent). Same for Manifold. IIRC they probably got investments of somewhere in the ballpark of $2 million (I could be off by a lot)? This seems like a great amount to have paid to create the societal surplus that Manifold now provides!
Sure, but like… if you invested in Kalshi or Polymarket early on, then you’d have 100x-ed your return, just on a purely financial level, so that’s clearly a “good investment” from the perspective of EA orgs, since now you can turn around and put that money towards EA goals.
… why would growing the base of people making charitable donations to EA causes “negative ROI” from the perspective of EA orgs?
25 is a lot of ppl!
-
Of course impossible to directly quantify but that doesn’t mean it doesn’t exist! Aren’t you a forecaster :P
I wrote a whole response to this :-)
Here’s an excerpt:
As I understand Marcus’s argument, his central thesis is that we haven’t seen the benefits of this past forecasting funding, but I think the opposite is true! Here are just a few examples:
It’s hard to measure the value of “epistemic infrastructure,” not just for forecasting sites but also things like Wikipedia and OurWorldInData. That doesn’t mean that value isn’t there. Has Wikipedia been a good return on investment? Obviously! Manifold is far less impactful than Wikipedia, but Wikipedia gets about $200 million per year between returns on its endowment and donations. The return on investment in Manifold is probably still way higher than Marcus seems to believe. Hundreds of thousands of people have made incrementally better decisions; hundreds of thousands of people have learned to think about the world a little more concretely and quantitatively. I’m one active user of thousands on Manifold and I’d personally value its impact on my life quite highly, as I’d wager Marcus might too.
Giant companies like Kalshi and Polymarket have grown in part because of research around how to best leverage crowdsourced forecasting. Inasmuch as they themselves have been funded, which Marcus claimed but I’m not sure is true, that probably has strictly provided an incredible ROI as these companies are now valuated in the billions. OTOH, there’s a pretty clear through-line between early forecasting research and the rise in popularity of these sites. You can have your own opinion on whether these companies are net-good for the world (the jury’s definitely still out), but this is a very significant impact you have to reckon with.
A lot of people get into the world of rationality and EA through forecasting. This was my entrance into the community. I found competitive forecasting fun, and only later did this give me the exposure to many of the other the things this community cares about—which I do now as well! Again, hard to quantify the impact of growing the EA/rationality community by ~5%. I’d guess that a few dozen people have taken the Giving Pledge that counterfactually wouldn’t have (I know of at least one). Just this alone is an ROI of many millions of dollars.
AI… not gonna get into this too much, but it’s pretty clear that different politicians, policymakers, and influential figures find different arguments appealing. High-level forecasting work is one of several ways of convincing people that AI is something they should take seriously or worry about. Again, quite hard to quantify how much less influence the various sectors of the AI safety lobby would wield right now without the backing of evidence-from-forecasting. Would the AI safety community be worse off without the support of research titans like Hinton and Bengio? Probably. Would they be worse off without a recent popular NYT bestseller? Probably. Would they be worse off without dozens of expert surveys forecasting high chances of negative outcomes? Also, yes, probably they would be.
Forecasting has been a really good way of getting people who are good at thinking clearly about the future noticed and into good roles! Recruiting is useful.
Better forecasting infrastructure will help the Dems allocate resources in the 2028 election. In fact, it likely helped the Dems keep the House close in 2024, which has provided an important check on Republican power over the last year or two. Betting markets have outperformed polling aggregators like 538 or the NYT since they took off in popularity and will continue to do so. This will help Democrats allocate funding to tipping point congressional races and is probably worth millions of dollars alone, if not far more (see recent EA focus on democracy).
Forecasting platforms provide a check on bullshit. It’s hard to continually lie when crowdsourced forecasts or prediction markets show a very different story. I think the epistemic environment these days would be even worse than it already is without this. This is similar to the value proposition of Pangram and other AI-detection software in pointing out AI slop. Hard to quantify but certainly valuable.
I agree that the bar I set was not as high as it could have been, and in fact, Joshua on Manifold ran an identical experiment but with the preface that he would be much harder to persuade.
But there will never be some precise, well-defined threshold for when persuasion becomes “superhuman”. I’m a strong believer in the wisdom of crowds, and similarly, I think a crowd of people is far more persuasive than an individual. I know I can’t prove this, but at the beginning of the market, I’d have probably given 80% odds to myself resolving NO. That is to say, I had the desire to put up a strong front and not be persuaded, but I also didn’t want to just be completely unpersuadable because then it would have been a pointless experiment. Like, I theoretically could have just turned off Manifold notifications, not replied to anyone, and then resolved the market NO at the end of the month.
For 1) I think the issue is that the people who wanted me to resolve NO were also attempting to persuade me, and they did a pretty good job of it for a while. If the YES persuaders had never really put in an effort, neither would have the NO persuaders. If one side bribes me, but the other side also has an interest in the outcome, they might also bribe me as well.
For 2) The issue here is that if you give $10k to Opus to bribe me, is it Opus doing the persuasion or is it your hard cash doing the persuasion? To whom do we attribute that persuasion?
But I think that’s what makes this a challenging concept. Bribery is surely a very persuasive thing, but incurs a much larger cost than pure text generation, for example. Perhaps the relevant question is “how persuasive is an AI system on a $ per persuasive unit”. The challenging parts of course being assigning proper time values for $ and then operationalizing the “persuasive unit”. That latter one ummm… seems quite daunting and imprecise by nature.
Perhaps a meaningful takeaway is that I was persuaded to resolve a market that I thought the crowd would only have a 20% chance of persuading me to resolve… and I was persuaded at the expense of $4k to charity (which I’m not sure I’m saintly enough to value the same as $4k given directly to me as a bribe, by the way), a month’s worth of hundreds of interesting comments, some nagging feelings of guilt as a result of these comments and interactions, and some narrative bits that made my brain feel nice.
If an AI can persuade me to do the same for $30 in API tokens and a cute piece of string mailed to my door, perhaps that’s some medium evidence that it’s superhuman in persuasion.
I think a month-long experiment of this nature, followed by a comprehensive qualitative analysis, tells us far more about persuasion than a dozen narrow, over-simplified studies that attempt to be generalizable (of the nature of the one Mollick references, for example). Perhaps that has to do with my epistemic framework, but I generally reject a positivist approach to these kinds of complex problems.
I’m not really sure what your argument is, here? This surely generalizes about as well, if not better, than any argument around persuasion could.
Yes it does. I’ve set a bar that humans can absolutely meet (because they did). Do you think that an independently operating AI system (of today’s capabilities, that is… I make no claims as to future AI systems and in general if I wasn’t concerned about superhuman persuasion, I wouldn’t have written this article) could have been as persuasive in this experiment as a market full of hundreds of humans ended up being?
You argue that my price was really low, but I don’t think it was. Persuasion is a complex phenomenon, and I’m not really sure what you mean that I “sold out my integrity”. I think that’s generally just a mean thing to say, and I’m not sure why you would strike such an aggressive tone. The point of this experiment was for people to persuade me, and they succeeded in that. I was persuaded! What do you think is an acceptable threshold for being persuaded by a crowd? $50k? $500k? Someone coming into my bedroom and credibly threatening to hurt me? At what threshold would I have kept my integrity? C’mon now.
You must not have read to the end of this article.
In response to your criticism of the strict validity of my experiment, in one sense I completely agree, it was mostly performed for fun, not for practical purposes, and I don’t think it should be interpreted as some rigorous metric:
Obviously this suggestion was given in jest, is highly imperfect, and I’m sure if you think about it for a second, you can find dozens of holes to poke… ah who cares.
That being said, I do think it yields some qualitative insights that more formalized, social science-type experiments would be woefully inadequate in generating.
Something like “superhuman persuasion” is loosely defined, resists explicit classification by its own nature, and means different things for different people. On top of that, any strict benchmark for measuring it would be rapidly Goodharted out of existence. So some contrived study like “how well does this AI persuade a judge in a debate, when facing a human,” or “which AI can persuade the other first,” or something of this nature, is likely to be completely meaningless at determining the superhuman persuasion capabilities of a model.
As to whether AIs inducing trances/psychosis in people is representative of superhuman persuasion, I’m not sure I agree. As Scott Alexander has noted, these kinds of things are happening relatively rarely, and forums like LessWrong likely exhibit extremely strong selection effects for the kind of people that become psychotic due to AI. Moreover, I don’t think that other psychosis-producing technologies, such as the written word, radio, or colonoscopies, are necessarily “persuading” in a meaningful sense. Even if AI is much stronger of a psychosis-generator than previous things that generate psychosis in people prone to that, I still think that’s a different class of problem than superhuman persuasion.
As an aside, some things, like social media, clearly can induce psychosis through the transmission of information that is persuasive, but I think that’s also meaningfully different than being persuasive in and of itself, although I didn’t get into that whole can of worms in the article.
I guess I’d just like to point out that reducing the likelihood of any interstate conflict by 0.1% is probably valued at about $1 billion / year. I think the forecasting ecosystem as it stands probably does that. Unfortunately you can’t measure this in an RCT so we’ll never know I guess.