MichaelDickens comments on “The Urgency of Interpretability” (Dario Amodei)

MichaelDickens 28 Apr 2025 14:11 UTC
1 point
0

prosaic alignment is clearly not scalable to the types of systems they are actively planning to build

Why do you believe this?

(FWIW I think it’s foolish that all (?) frontier companies are all-in on prosaic alignment, but I am not convinced that it “clearly” won’t work.)
- Davidmanheim 28 Apr 2025 21:30 UTC
  3 points
  0
  Parent
  Because they are all planning to build agents that will have optimization pressures, and RL-type failures apply when you build RL systems, even if it’s on top of LLMs.