I think it would be good if you did a dialogue with some AF researcher who thinks that [the sort of AF-ish research, which you compare to “mathematicians screwing around”] is more promising on the current margin for [tackling the difficult core problems of AGI alignment] than you think it is. At the very least, it would be good to try.[1]
E.g. John? Abram? Sam? I think the closest thing to this that you’ve had on LW was the discussion with Steven in the comments under the koan post.
I think it’s good for the world that your timelines debate with Abram is out on LW, which also makes me think a similar debate on this topic would be good for the world.
I would be up for it. I truly don’t know if we actually disagree though; many of them might just say “yeah it’s hard to tell whether this will get anywhere any time soon, but this seems like some natural next steps of investigation, with some degree of canonicalness, but this could take a really long time”. Or maybe many would actually say “yes this is on the mainline for alignment research and could work in a small number of decades”, I don’t know. I guess my strongest position would be “there’s some other type of thing which still would be really hard and might not work, but which has a better shot”, which we could debate about, though that would also be frustrating because my position is basically just a guess about methodology about theory, so doubly/triply hard to find cruxes about.
I think it would be good if you did a dialogue with some AF researcher who thinks that [the sort of AF-ish research, which you compare to “mathematicians screwing around”] is more promising on the current margin for [tackling the difficult core problems of AGI alignment] than you think it is. At the very least, it would be good to try.[1]
E.g. John? Abram? Sam? I think the closest thing to this that you’ve had on LW was the discussion with Steven in the comments under the koan post.
I think it’s good for the world that your timelines debate with Abram is out on LW, which also makes me think a similar debate on this topic would be good for the world.
I would be up for it. I truly don’t know if we actually disagree though; many of them might just say “yeah it’s hard to tell whether this will get anywhere any time soon, but this seems like some natural next steps of investigation, with some degree of canonicalness, but this could take a really long time”. Or maybe many would actually say “yes this is on the mainline for alignment research and could work in a small number of decades”, I don’t know. I guess my strongest position would be “there’s some other type of thing which still would be really hard and might not work, but which has a better shot”, which we could debate about, though that would also be frustrating because my position is basically just a guess about methodology about theory, so doubly/triply hard to find cruxes about.