Wei Dai comments on A broad basin of attraction around human values?

Wei Dai 12 Feb 2022 16:12 UTC
LW: 10 AF: 6
AF

My inclination is to guess that there is a broad basin of attraction if we’re appropriately careful in some sense (and the same seems true for corrigibility).

In other words, the attractor basin is very thin along some dimensions, but very thick along some other dimensions.

What do you think are the chances are of humanity being collectively careful enough, given that (in addition from the bad metapreferences I cited in the OP) it’s devoting approximately 0.0000001% of its resources (3 FTEs, to give a generous overestimate) to studying either metaphilosophy or metapreferences in relation to AI risk, just years or decades before transformative AI will plausibly arrive?

One reason some people cited ~10 years ago for being optimistic about AI risks that they expected as AI gets closer, human civilization will start paying more attention to AI risk and quickly ramp up its efforts on that front. That seems to be happening on some technical aspects of AI safety/alignment, but not on metaphilosophy/metapreferences. I am puzzled why almost no one is as (visibly) worried about it as I am, as my update (to the lack of ramp-up) is that (unless something changes soon) we’re screwed unless we’re (logically) lucky and the attractor basin just happens to be thick along all dimensions.