Davidmanheim comments on “The Urgency of Interpretability” (Dario Amodei)

Davidmanheim 28 Apr 2025 21:30 UTC
3 points
0
Because they are all planning to build agents that will have optimization pressures, and RL-type failures apply when you build RL systems, even if it’s on top of LLMs.