For the goal of eventually creating FAI, it seems work can be roughly divided into making the first AGI (1) have humane values and (2) keep those values. Current attention seems to be focused on the 2nd category of problems. The work I’ve seen in the first category: CEV (9 years old!), Paul Christiano’s man-in-a-box indirect normativity, Luke’s decision neuroscience, Daniel Dewey’s value learning… I really like these approaches but they are only very early starting points compared to what will eventually be required.
Do you have any plans to tackle the humane values problem? Do MIRI-folk have strong opinions on which direction is most promising? My worry is that if this problem really is as intractable as it seems, then working on problem (2) is not helpful, and our only option might be to prevent AGI from being developed through global regulation and other very difficult means.
For the goal of eventually creating FAI, it seems work can be roughly divided into making the first AGI (1) have humane values and (2) keep those values. Current attention seems to be focused on the 2nd category of problems. The work I’ve seen in the first category: CEV (9 years old!), Paul Christiano’s man-in-a-box indirect normativity, Luke’s decision neuroscience, Daniel Dewey’s value learning… I really like these approaches but they are only very early starting points compared to what will eventually be required.
Do you have any plans to tackle the humane values problem? Do MIRI-folk have strong opinions on which direction is most promising? My worry is that if this problem really is as intractable as it seems, then working on problem (2) is not helpful, and our only option might be to prevent AGI from being developed through global regulation and other very difficult means.
Yes. The next open problem description in Eliezer’s writing queue is in this category.