This comment seems incorrectly downvoted, this is a very reasonable & common criticism of many in alignment who can never seem to see anything AIs do which don’t make them more pessimistic.
(I can safely say that I updated away from AI risk while AIs were getting more competent but seeming benign during the supervised fine-tuning phase, and have updated back after seeing AIs do highly agentic & misaligned (such as lying to me) things during this RL-on chain-of-thought phase)
This comment seems incorrectly downvoted, this is a very reasonable & common criticism of many in alignment who can never seem to see anything AIs do which don’t make them more pessimistic.
(I can safely say that I updated away from AI risk while AIs were getting more competent but seeming benign during the supervised fine-tuning phase, and have updated back after seeing AIs do highly agentic & misaligned (such as lying to me) things during this RL-on chain-of-thought phase)