Yonatan Cale comments on MONA: Managed Myopia with Approval Feedback

Yonatan Cale 28 Jan 2025 22:00 UTC
1 point
0
nit: I wouldn’t use a prediction market as an overseer because markets are often uninterpretable to humans, which would miss some of the point^[1].
1. ^
  “we show how to get agents whose long-term plans follow strategies that humans can predict”. But maybe no single human actually understands the strategy. Or maybe the traders are correctly guessing that the model’s steps will somehow lead to whatever is defined as a “good outcome”, even if they don’t understand how, which has similar problems to the RL reward from the future that you’re trying to avoid.
- Rohin Shah 29 Jan 2025 9:23 UTC
  2 points
  0
  Parent
  Discussed in the paper in Section 6.3, bullet point 3. Agreed that if you’re using a prediction market it’s no longer accurate to say that individual humans understand the strategy.