joshc comments on How AI Takeover Might Happen in 2 Years

joshc 19 Feb 2025 20:30 UTC
3 points
0
> seeking reward because it is reward and that is somehow good

I do think there is an important distinction between “highly situationally aware, intentional training gaming” and “specification gaming.” The former seems more dangerous.

I don’t think this necessarily looks like “pursuing the terminal goal of maximizing some number on some machine” though.

It seems more likely that the model develops a goal like “try to increase the probability my goals survive training, which means I need to do well in training.”

So the reward seeking is more likely to be instrumental than terminal. Carlsmith explains this better:
https://arxiv.org/abs/2311.08379
- Jacob G-W 20 Feb 2025 4:42 UTC
  1 point
  0
  Parent
  Thanks for the reply! I thought that you were saying the reward seeking was likely to be terminal. This makes a lot more sense.