paulfchristiano comments on Can we efficiently distinguish different mechanisms?

paulfchristiano 29 May 2023 16:28 UTC
2 points
0
I agree that it seems very bad if we build AI systems that would “prefer” to tamper with sensors (including killing humans if necessary) but are prevented from doing so by physical constraints.
I currently don’t see how to approach value learning (in the worst case) without solving something like ELK. If you want to take a value learning perspective, you could view ELK as a subproblem of the easy goal inference problem. If there’s some value learning approach that routes around this problem I’m interested in it, but I haven’t seen any candidates and have spent a long time talking with people about it.