Seth Herd comments on Deploying the Observer will save humanity from existential threats

Seth Herd 6 Feb 2025 5:06 UTC
2 points
0
I just don’t think the analogy to software bugs and user input goes very far. There’s a lot more going on in alignment theory.

It seems like “seeing the story out to the end” involves all sorts of vague hard to define things very much like “human happiness” and “human intent”.

It’s super easy to define a variety of alignment goals; the problem is that we wouldn’t like the result of most of them.
- Aram Panasenco 6 Feb 2025 16:06 UTC
  1 point
  0
  Parent
  Fair enough, you have a lot more experience, and I could be totally wrong on this point.
  At this point, if I’m going to do anything, it should probably be getting hands on and actually trying to build an aligned system with RLHF or some other method.
  Thank you for engaging on this and my previous posts Seth!