rotatingpaguro comments on AI Safety Research Futarchy: Using Prediction Markets to Choose Research Projects for MARS

rotatingpaguro 4 Oct 2025 18:32 UTC
4 points
0
I was referring to the fact that you set LessWrong posts with karma thresholds as target metrics. This kind of thing has in general the negative side effect of incentivizing exploitation of loopholes in the LessWrong moderation protocol, karma system, and community norms, to increase the karma of one own’s posts. See Goodhart’s law.

I do not think this is currently a problem. My superficial impression of your experiment is that it is good. However, this kind of thing could become a problem down the line if it becomes more common. This will be born out as a mix of lowering the quality of the forum and increased moderation work.
- JasonBrown 5 Oct 2025 16:57 UTC
  2 points
  0
  Parent
  Ahh I see what you mean now, thank you for the clarification.
  
  I agree that in general people trying to exploit and Goodhart LW karma would be bad, though I hope the experiment would not contribute to this. Here, post karma is only being used as a measure, not as a target. The mentors and mentees gain nothing beyond what any other person would normally gain by their research project resulting in a highly-upvoted LW post. Predicted future post karma is just being used optimise over research ideas, and the space of ideas itself is very small (in this experiment) and I doubt we’ll get any serious Goodharting by selection of them that are perhaps not very good research but likely to produce particularly mimetic LW posts (and even then this is part of the motivation of having several metrics, so that none get too specifically optimised for).
  
  There is perhaps an argument that those who have predicted a post would get high karma might want to manipulate it up to make their prediction come true, but those who predicted it would be lower have the opposite incentive. Regardless of that, that kind of manipulation is I think quite strictly prohibited by both LW and Manifold guidelines, and anyone caught doing it in a serious way would likely be severely reprimanded. In the worst case, if any of the metrics are seriously and obviously manipulated in a way that cannot be rectified, the relevant markets will be resolved N/A, though I think this happening is extremely low probability.
  
  All that said, I think it is important to think about what more suitable / better metrics would be, if research futarchy was to become more common. I can certainly imagine a world where widespread use of LW post karma as a proxy for research success could have negative impacts on the LW ecosytem, though I hope by then there will have been more development and testing of robust measures beyond our starting point (which, for the record, I think is somewhat robust already).