TAG comments on the void

TAG 17 Jun 2025 9:38 UTC
2 points
0

Current AIs aren’t trying to execute takeovers because they are weaker optimizers than humans.

Or because they are not optimizers at all.
- Raphael Roche 14 Jul 2025 20:00 UTC
  1 point
  −1
  Parent
  I don’t agree, they somehow optimize the goal of being a HHH assistant. We could almost say that they optimize the goal of being aligned. As nostalgbraist reminds us, Anthropic’s HHH paper was an alignment work in the first place. It’s not that surprising that such optimizers happen to be more aligned that the canonical optimizers envisioned by Yudkowsky.
  Edit : precision : by “they” I mean the base models trying to predict the answers of an HHH assistant as good as possible (“as good as possible” being clearly a process of optimization or I don’t know what it’s mean). And in my opinion a sufficiently good prediction is effectively or pratically a simulation. Maybe not a bit perfect simulation, but a lossy simulation, an heuristic towards simulation.