Noosphere89 comments on “Behaviorist” RL reward functions lead to scheming

Noosphere89 26 Jul 2025 18:32 UTC
2 points
0
Not Steven Byrnes, but I think one area where I differ on expectations for MONA is that I expect T to increase a bunch, insofar as AI companies are succeeding at making AI progress, and a lot of the reason for this is that I think a lot of the most valuable tasks for AI implicitly rely on long-term memory/insane context lengths, and T in this instance could easily be 1 year or 10 years or more, and depending on how the world looks like, these are in the ranges where AI would be able and willing to power-seek more than currently.

Note this is not necessarily an argument for MONA certainly failing, but one area where I’ve changed my mind is I now think a lot of AI tasks are unlikely to be easily broken down into smaller chunks such that we can limit the use of outcome-based RL for long periods.

Edit: Deleted the factored cognition comment.