Fair enough, you have a lot more experience, and I could be totally wrong on this point.
At this point, if I’m going to do anything, it should probably be getting hands on and actually trying to build an aligned system with RLHF or some other method.
Thank you for engaging on this and my previous posts Seth!
I just don’t think the analogy to software bugs and user input goes very far. There’s a lot more going on in alignment theory.
It seems like “seeing the story out to the end” involves all sorts of vague hard to define things very much like “human happiness” and “human intent”.
It’s super easy to define a variety of alignment goals; the problem is that we wouldn’t like the result of most of them.
Fair enough, you have a lot more experience, and I could be totally wrong on this point.
At this point, if I’m going to do anything, it should probably be getting hands on and actually trying to build an aligned system with RLHF or some other method.
Thank you for engaging on this and my previous posts Seth!