The current most intelligent and aligned beings should always be supervising their successor, using more total resources at first, such that they can’t effectively be tricked/subverted.
Obviously it helps to do this, but I think it is far from sufficient.
Sufficient for what? I’d agree it’s clearly insufficient for getting p(doom) < 1%, but plausibly fine for under 25%. [1]
(assuming my mentioned best available plan from an earlier response to Vladimir_Nesov “My implied best available plan for humanity is to create each successive superintelligence with sufficiently fewer resources that it could not takeover despite its mild efficiency advantage at using resources strategically. Thus, you can create and deploy misaligned superintelligence and not end up in the doom scenarios but get to try again.” This is rather sparse and vague and I’d like to write more on this in the future, but it’s vaguely assume that Redwood agenda is implemented at all top labs at least semi competently)
I don’t think it’s sufficient for getting p(doom) < 25%. It’s hard to give numbers to p(doom) obviously but I’d be curious to see you sketch out in more detail what your scheme looks like—what is it that we get the companies of the world to agree to, exactly, and how do we enforce that, and then supposing we get all that enforced, why would it work?
It sounds like you are proposing we do AI control, at a high level, but we also make sure that progress is very continuous so that the gaps between AI model capability levels are small. Are you also ensuring that the AIs doing the monitoring etc. are from different lineages created by different companies? Etc.
Obviously it helps to do this, but I think it is far from sufficient.
Sufficient for what? I’d agree it’s clearly insufficient for getting p(doom) < 1%, but plausibly fine for under 25%. [1]
(assuming my mentioned best available plan from an earlier response to Vladimir_Nesov “My implied best available plan for humanity is to create each successive superintelligence with sufficiently fewer resources that it could not takeover despite its mild efficiency advantage at using resources strategically. Thus, you can create and deploy misaligned superintelligence and not end up in the doom scenarios but get to try again.”
This is rather sparse and vague and I’d like to write more on this in the future, but it’s vaguely assume that Redwood agenda is implemented at all top labs at least semi competently)
Of course I’d prefer if we lived in the world where we could get p(doom) << 1%, here I’m trying to disambiguate what goes wrong under a given plan.
I don’t think it’s sufficient for getting p(doom) < 25%. It’s hard to give numbers to p(doom) obviously but I’d be curious to see you sketch out in more detail what your scheme looks like—what is it that we get the companies of the world to agree to, exactly, and how do we enforce that, and then supposing we get all that enforced, why would it work?
It sounds like you are proposing we do AI control, at a high level, but we also make sure that progress is very continuous so that the gaps between AI model capability levels are small. Are you also ensuring that the AIs doing the monitoring etc. are from different lineages created by different companies? Etc.