Very good post! I agree with most of what you have written, but I’m not sure about the conclusions. Two main reasons:
I’m not sure if mech interp should be compared to astronomy, I’d say it is more like mechanical engineering. We have JWST because long long time ago there were watchmakers, gunsmiths, opticans etc who didn’t care at all about astronomy, yet their advances in unrelated fields made astronomy possible. I think something similar might happen with mech interp—we’ll keep creating better and better tools to achieve some goals, these goals will in the end turn up useless from the alignment point of view, but the tools will not.
Many people think mech interp is cool and fun. I’m personally not a big fan, but I think it is much more interesting than e.g. governance. If our only perspective is AI safety, this shouldn’t matter—but people have many perspectives. There might not really be a choice between “this bunch of junior researches doing mech interp vs this bunch of junior researchers doing something more useful”, they would just go do something not related to alignment instead. My guess is that attractiveness of mech interp is the strongest factor for its popularity.
Very good post! I agree with most of what you have written, but I’m not sure about the conclusions. Two main reasons:
I’m not sure if mech interp should be compared to astronomy, I’d say it is more like mechanical engineering. We have JWST because long long time ago there were watchmakers, gunsmiths, opticans etc who didn’t care at all about astronomy, yet their advances in unrelated fields made astronomy possible. I think something similar might happen with mech interp—we’ll keep creating better and better tools to achieve some goals, these goals will in the end turn up useless from the alignment point of view, but the tools will not.
Many people think mech interp is cool and fun. I’m personally not a big fan, but I think it is much more interesting than e.g. governance. If our only perspective is AI safety, this shouldn’t matter—but people have many perspectives. There might not really be a choice between “this bunch of junior researches doing mech interp vs this bunch of junior researchers doing something more useful”, they would just go do something not related to alignment instead. My guess is that attractiveness of mech interp is the strongest factor for its popularity.