The First Filter

Consistently optimizing for solving alignment (or any other difficult problem) is incredibly hard.

The first and most obvious obstacle is that you need to actually care about alignment and feel responsible for solving it. You cannot just ignore it or pass the buck; you need to aim for it.

If you care, you now have to go beyond the traditions you were raised in. Be willing to go beyond the tools that you were given, and to use them in inappropriate and weird ways. This is where most people who care about alignment tend to fail — they tackle it like a normal problem from a classical field of science and not an incredibly hard and epistemologically fraught problem.

If you manage to transcend your methodological upbringing, you might come up with a different, fitter approach to attack the problem — your own weird inside view. Yet beware becoming a slave to your own insight, a prisoner to your own frame; it’s far too easy to never look back and just settle in your new tradition.

If you cross all these obstacles, then whatever you do, even if it is not enough, you will be one of the few who adapt, who update, who course-correct again and again. Whatever the critics, you’ll actually be doing your best.

This is the first filter. This is the first hard and crucial step to solve alignment: actually optimizing for solving the problem.

When we criticize each other in good faith about our approaches to alignment, we are acknowledging that we are not wedded to any approach or tradition. That we’re both optimizing to solve the problem. This is a mutual acknowledgement that we have both passed the first filter.

Such criticism should thus be taken as a strong compliment: your interlocutor recognizes that you are actually trying to solve alignment and open to changing your ways.