Maxime Riché comments on Reporting Tasks as Reward-Hackable: Better Than Inoculation Prompting?

Maxime Riché 21 Feb 2026 8:15 UTC
3 points
2
Seems possibly true. More generally an important underexplored tool seems to be something like shaping value exploration and self perception during RL. Here are some thoughts: Shaping Value Exploration During RL Training
Several persons are interested in working on that. Let’s coordinate if you are planning to research this.
- RogerDearnaley 21 Feb 2026 14:08 UTC
  3 points
  0
  Parent
  I appreciate the invitation. I am very interested in persona research, but I hadn’t intended to research this specific application of it: I simply proposed it, in the hope someone (most likely at a foundation lab) might pick it up (if they’re not already doing so). However, if someone else was taking this on, then I’d be interested in being involved.
  
  Thanks for the link to your doc: it’s thought-provoking and closely related, and I have added some comments. Feel free to shift this to PMs — I am also on the AI Alignment, MATS, and Meridian Slacks.