An alternative to AI being able to do moral philosophy correctly is developing an AI/human ecosystem that somehow preserves our collective ability to eventually discover our values and optimize for them, while not having a clear specification of what our values are or how to do moral philosophy in the meantime.
That’s what I hope the various low-impact ideas will do.
[...] actually understand these difficulties
I think they do, partially. CIRL is actually a decent step forwards, but I think they thought it was more of step forward than it was.
Or maybe they thought that a little bit of extra work (a bit of meta-preferences, for instance) would be enough to make CIRL work.
That’s what I hope the various low-impact ideas will do.
I think they do, partially. CIRL is actually a decent step forwards, but I think they thought it was more of step forward than it was.
Or maybe they thought that a little bit of extra work (a bit of meta-preferences, for instance) would be enough to make CIRL work.