After skimming, I’m still confused. How do you actually want us to use these? GPT4 is undoubtedly useful, including for people trying to save the world, but it’s not clear what we should want to do with it. So to for goal agnostic systems.
I intentionally left out the details of “what do we do with it” because it’s conceptually orthogonal to goal agnosticism and is a huge topic of its own. It comes down to the class of solutions enabled by having extreme capability that you can actually use without it immediately backfiring.
For example, I think this has a real shot at leading to a strong and intuitively corrigible system. I say “intuitively” here because the corrigibility doesn’t arise from a concise mathematical statement that solves the original formulation. Instead, it lets us aim it at an incredibly broad and complex indirect specification that gets us all the human messiness we want.
After skimming, I’m still confused. How do you actually want us to use these? GPT4 is undoubtedly useful, including for people trying to save the world, but it’s not clear what we should want to do with it. So to for goal agnostic systems.
I intentionally left out the details of “what do we do with it” because it’s conceptually orthogonal to goal agnosticism and is a huge topic of its own. It comes down to the class of solutions enabled by having extreme capability that you can actually use without it immediately backfiring.
For example, I think this has a real shot at leading to a strong and intuitively corrigible system. I say “intuitively” here because the corrigibility doesn’t arise from a concise mathematical statement that solves the original formulation. Instead, it lets us aim it at an incredibly broad and complex indirect specification that gets us all the human messiness we want.