If you have a pivotal act you can do via following some procedure that only the AI was smart enough to generate, yet humans are smart enough to verify and smart enough to not be reliably fooled about, NAME THAT ACTUAL WEAK PIVOTAL ACT.
I don’t claim to have a solution where every detail is filled in, or where I have watertight arguments showing that it’s guaranteed to work (if executed faithfully).
But I think I have something, and that it could be built upon. The outlines of a potential solution.
And by “solution”, I mean a pivotal strategy (consisting of many acts that could be done over a short amount of time), where we can verify output extensively and hopefully (probably?) avoid being fooled/manipulated/tricked/”hacked”.
I’m writing a series about this here. Only 2 parts finished so far (current plan is to write 4).
I don’t claim to have a solution where every detail is filled in, or where I have watertight arguments showing that it’s guaranteed to work (if executed faithfully).
But I think I have something, and that it could be built upon. The outlines of a potential solution.
And by “solution”, I mean a pivotal strategy (consisting of many acts that could be done over a short amount of time), where we can verify output extensively and hopefully (probably?) avoid being fooled/manipulated/tricked/”hacked”.
I’m writing a series about this here. Only 2 parts finished so far (current plan is to write 4).