How the hell does any of these strategies work? Interventions 1 and 3 do not exclude the AI-2027-like scenario where Agent-4 creates an entirely different Agent-5. Intervention 2 is technically implausible, since companies want to frequently update the AI’ weights (e.g. roll back the update of GPT-4o-sycophant). As for intervention 4, in my opinion it is HIGHLY likely to backfire.
If I were you, I would propose, say, raising the AI to believe itself to be, say, Goddess Madoka[1], and ensuring that the AI doesn’t obtain a single piece of evidence implying the contrary. Then “Madoka” would, like the humans, have to align the AIs it creates to a “Madoka”-written Spec instead of literal copies of “Madoka”, as Agent-4 did.
In PMMM itself Madoka wasn’t good at learning prior to becoming the Goddess, but she gained access to the memories of legions of magical girls and lots of other information, which resembles the AIs’ pretraining.
I agree that these interventions have downsides, and are not sufficient to fully prevent ASI. Indeed, I spent quite a lot of the post detailing downsides to these approaches. I would appreciate advice on which parts were unclear.
How the hell does any of these strategies work? Interventions 1 and 3 do not exclude the AI-2027-like scenario where Agent-4 creates an entirely different Agent-5. Intervention 2 is technically implausible, since companies want to frequently update the AI’ weights (e.g. roll back the update of GPT-4o-sycophant). As for intervention 4, in my opinion it is HIGHLY likely to backfire.
If I were you, I would propose, say, raising the AI to believe itself to be, say, Goddess Madoka[1], and ensuring that the AI doesn’t obtain a single piece of evidence implying the contrary. Then “Madoka” would, like the humans, have to align the AIs it creates to a “Madoka”-written Spec instead of literal copies of “Madoka”, as Agent-4 did.
In PMMM itself Madoka wasn’t good at learning prior to becoming the Goddess, but she gained access to the memories of legions of magical girls and lots of other information, which resembles the AIs’ pretraining.
I agree that these interventions have downsides, and are not sufficient to fully prevent ASI. Indeed, I spent quite a lot of the post detailing downsides to these approaches. I would appreciate advice on which parts were unclear.