So if it’s difficult to get amazing trustworthy work out of a machine actress playing an Eliezer-level intelligence doing a thousand years worth of thinking, your proposal to have AIs do our AI alignment homework fails on the first step, it sounds like?
I do not think that the initial humans at the start of the chain can “control” the Eliezers doing thousands of years of work in this manner (if you use control to mean “restrict the options of an AI system in such a way that it is incapable of acting in an unsafe manner”)
That’s because each step in the chain requires trust.
For N-month Eliezer to scale to 4N-month Eliezer, it first controls 2N-month Eliezer while it does 2 month tasks, but it trusts 2-Month Eliezer to create a 4N-month Eliezer.
So the control property is not maintained. But my argument is that the trust property is. The humans at the start can indeed trust the Eliezers at the end to do thousands of years of useful work—even though the Eliezers at the end are fully capable of doing something else instead.
So if it’s difficult to get amazing trustworthy work out of a machine actress playing an Eliezer-level intelligence doing a thousand years worth of thinking, your proposal to have AIs do our AI alignment homework fails on the first step, it sounds like?
I do not think that the initial humans at the start of the chain can “control” the Eliezers doing thousands of years of work in this manner (if you use control to mean “restrict the options of an AI system in such a way that it is incapable of acting in an unsafe manner”)
That’s because each step in the chain requires trust.
For N-month Eliezer to scale to 4N-month Eliezer, it first controls 2N-month Eliezer while it does 2 month tasks, but it trusts 2-Month Eliezer to create a 4N-month Eliezer.
So the control property is not maintained. But my argument is that the trust property is. The humans at the start can indeed trust the Eliezers at the end to do thousands of years of useful work—even though the Eliezers at the end are fully capable of doing something else instead.