I’m a big heuristics bridging fan as I think that it is to some extent a way to describe a very compressed action-policy based on an existing reward function that has been tested in the past.
So we can think about what you’re saying here as a way to learn values to some extent or another. By bridging local heuristics we can find better meta heuristics and also look at what times these heuristics would be optimal. This is why I really like Meaning Alignment Institute’s work on this because they have a way of doing this at scale: https://arxiv.org/pdf/2404.10636
I also think that a part of the “third wave” of AI Safety which is more focused on sociotechnical stuff kind of gets around the totalitarian and control heuristics as it’s saying it can be solved in a pro-social way? I really enjoyed this post, thanks for writing it!
I’m a big heuristics bridging fan as I think that it is to some extent a way to describe a very compressed action-policy based on an existing reward function that has been tested in the past.
So we can think about what you’re saying here as a way to learn values to some extent or another. By bridging local heuristics we can find better meta heuristics and also look at what times these heuristics would be optimal. This is why I really like Meaning Alignment Institute’s work on this because they have a way of doing this at scale: https://arxiv.org/pdf/2404.10636
I also think that a part of the “third wave” of AI Safety which is more focused on sociotechnical stuff kind of gets around the totalitarian and control heuristics as it’s saying it can be solved in a pro-social way? I really enjoyed this post, thanks for writing it!