This is fixable by using them in careful, narrow ways, but people often talk about a “Full handover,” without sounding like they believe all the constraints I believe
I think it’s more like “if you’ve done the your control work well, you have trustworthy AIs to handoff to, by the time you’re doing a handoff.”
It sounds like you’re talking about imposing a bunch of constraints on the AI’s that you’re doing the handoff to, as opposed to the AIs that you’re using to do (most of) the work of building the AIs that you’re handing off to. According to the plan as I’ve understood it, the control comes earlier in the process.
Both? My impression was they (Redwood in particular but presumably also OpenAI and Anthropic) expected to be using a lot of AI assistance along the way.
But, when I said “constraints” I meant “solving the problem requires some set of criteria”, not “applying constraints to the AI” (although I’d also want that).
Where, constraints would be like “alignment is hard in a way that specifically resists full-handoff and it requires a philosophically-competent human in the loop till pretty close to the end.” (and, then specifically operational-detail-constraints like “therefore, you need to have a pretty good map of which tasks can be delegated”)
I think it’s more like “if you’ve done the your control work well, you have trustworthy AIs to handoff to, by the time you’re doing a handoff.”
I’m not sure how you’re contrasting this with the point I was making.
It sounds like you’re talking about imposing a bunch of constraints on the AI’s that you’re doing the handoff to, as opposed to the AIs that you’re using to do (most of) the work of building the AIs that you’re handing off to. According to the plan as I’ve understood it, the control comes earlier in the process.
Both? My impression was they (Redwood in particular but presumably also OpenAI and Anthropic) expected to be using a lot of AI assistance along the way.
But, when I said “constraints” I meant “solving the problem requires some set of criteria”, not “applying constraints to the AI” (although I’d also want that).
Where, constraints would be like “alignment is hard in a way that specifically resists full-handoff and it requires a philosophically-competent human in the loop till pretty close to the end.” (and, then specifically operational-detail-constraints like “therefore, you need to have a pretty good map of which tasks can be delegated”)