I gave a lightning talk with my particular characterization, and included “swiss cheese” i.e. that gsai sources some layers of swiss cheese without trying to be one magic bullet. But if people agree with this, then really guaranteed-safe ai is a misnomer, cuz guarantee doesn’t evoke swiss cheese at all
I’m surprised to hear you say that, since you write
Upfront, I want to clarify: I don’t believe or wish to claim that GSAI is a full or general panacea to AI risk.
I kinda think anything which is not a panacea is swiss cheese, that those are the only two options.
In a matter of what sort of portofolio can lay down slices of swiss cheese at what rate and with what uncorrelation. And I think in this way GSAI is antifragile to next year’s language models, which is why I can agree mostly with Zac’s talk and still work on GSAI (I don’t think he talks about my cruxes).
Specifically, I think the guarantees of each module and the guarantees of each pipe (connecting the modules) isolate/restrict the error to the world-model gap or the world-spec gap, and I think the engineering problems of getting those guarantees are straightforward / not conceptual problems. Furthermore, I think the conceptual problems with reducing the world-spec gap below some threshold presented by Safeguarded’s TA1 are easier than the conceptual problems in alignment/safety/control.
I understood Nora as saying that GSAI in itself is not a swiss cheese approach. This is different from saying that [the overall portfolio of AI derisking approaches, one of which is GSAI] is not a swiss cheese approach.
I gave a lightning talk with my particular characterization, and included “swiss cheese” i.e. that gsai sources some layers of swiss cheese without trying to be one magic bullet. But if people agree with this, then really guaranteed-safe ai is a misnomer, cuz guarantee doesn’t evoke swiss cheese at all
What’s the case for it being a swiss cheese approach? That doesn’t match how I think of it.
I’m surprised to hear you say that, since you write
I kinda think anything which is not a panacea is swiss cheese, that those are the only two options.
In a matter of what sort of portofolio can lay down slices of swiss cheese at what rate and with what uncorrelation. And I think in this way GSAI is antifragile to next year’s language models, which is why I can agree mostly with Zac’s talk and still work on GSAI (I don’t think he talks about my cruxes).
Specifically, I think the guarantees of each module and the guarantees of each pipe (connecting the modules) isolate/restrict the error to the world-model gap or the world-spec gap, and I think the engineering problems of getting those guarantees are straightforward / not conceptual problems. Furthermore, I think the conceptual problems with reducing the world-spec gap below some threshold presented by Safeguarded’s TA1 are easier than the conceptual problems in alignment/safety/control.
I understood Nora as saying that GSAI in itself is not a swiss cheese approach. This is different from saying that [the overall portfolio of AI derisking approaches, one of which is GSAI] is not a swiss cheese approach.