Garrett Baker comments on Jan Betley’s Shortform

Garrett Baker 11 Jun 2025 5:36 UTC
7 points
2
This comment seems incorrectly downvoted, this is a very reasonable & common criticism of many in alignment who can never seem to see anything AIs do which don’t make them more pessimistic.

(I can safely say that I updated away from AI risk while AIs were getting more competent but seeming benign during the supervised fine-tuning phase, and have updated back after seeing AIs do highly agentic & misaligned (such as lying to me) things during this RL-on chain-of-thought phase)