Kajetan Dymkiewicz

Karma: 56

Kajetan Dymkiewicz 3 Apr 2026 12:53 UTC
1 point
0
in reply to: Vili Kohonen’s comment on: Shaping the exploration of the motivation-space matters for AI safety
I don’t see a reason not to try, though I wonder if this doesn’t lead back to the underlying problem and reward hacking i.e. can we actually design a reward that couldn’t be exploited by say a trajectory that only looks benevolent?

Shaping the exploration of the motivation-space matters for AI safety

Maxime Riché, Victor Gillioz, nielsrolf, Kajetan Dymkiewicz, Filip Sondej, RogerDearnaley, Daniel Tan and dillonkn

6 Mar 2026 14:43 UTC

78 points

15 comments10 min readLW link