Vanessa Kosoy comments on My take on Vanessa Kosoy’s take on AGI safety

Vanessa Kosoy 3 Oct 2021 17:02 UTC
LW: 9 AF: 4
0
AF

I see the overarching narrative as “Try to solve the entire problem, but assuming humans have lots of nice properties. Then, start removing nice properties.”

Yes, that’s pretty accurate. My methodology is, start by making as many simplifying assumptions as you need as long as some of the core difficulty of the problem is preserved. Once you have a solution which works in that model, start relaxing the assumptions. For example, delegative IRL requires that the user takes the best action with the highest likelihood, delegative RL “only” requires that the user sometimes takes the best action and never enters traps, whereas HTDL’s assumptions are weaker still.