Nadav Brandes comments on AGI with RL is Bad News for Safety

Nadav Brandes 21 Dec 2024 21:37 UTC
3 points
0
Thank you Seth for the thoughtful reply. I largely agree with most of your points.

I agree that RL trained to accomplish things in the real world is far more dangerous than RL trained to just solve difficult mathematical problems (which in turn is more dangerous than vanilla language modeling). I worry that the real-world part will soon become commonplace, judging from current trends.

But even without the real-world part, models could still be incentivized to develop superhumam abilities and complex strategic thinking (which could be useful for solving mathematical and coding problens).

Regarding the chances of stopping/banning open-ended RL, I agree it’s a very tall order, but my impression of the advocacy/policy landscape is that people might be open to it under the right conditions. At any rate I wasn’t trying to reason about what’s reasonable to ask for, only on the implications of different paths. I think the discussion should start there, and then we can consider what’s wise to advocate for.

For all of these reasons, I fully agree with you that work on demonstrating these risks in a rigorous and credible way is one of the most important efforts for AI safety.