Yup, I more or less agree with all that. The name thing was just a joke about giving things we like better priority in namespace.
I think quantilization is safe when it’s a slightly “lucky” human-imitation (also if it’s a slightly “lucky” version of some simpler base distribution, but then it won’t be as smart). But push too hard, which might not be very hard at all if you’re iterating quantilization steps rather than quantilizing over a long-term policy, and instead you get an unaligned intelligence that happens to interact with the world by picking human-like behaviors that serve its purposes. (Vanessa pointed out to me that timeline-based DRL gets around the iteration problem because it relies on the human as an oracle for expected utility.)
Yup, I more or less agree with all that. The name thing was just a joke about giving things we like better priority in namespace.
I think quantilization is safe when it’s a slightly “lucky” human-imitation (also if it’s a slightly “lucky” version of some simpler base distribution, but then it won’t be as smart). But push too hard, which might not be very hard at all if you’re iterating quantilization steps rather than quantilizing over a long-term policy, and instead you get an unaligned intelligence that happens to interact with the world by picking human-like behaviors that serve its purposes. (Vanessa pointed out to me that timeline-based DRL gets around the iteration problem because it relies on the human as an oracle for expected utility.)