people have repeatedly made the argument that it contributes more to capabilities on this forum, and so far it hasn’t seemed to convince that many interpretability researchers. I personally suspect this is largely because they’re motivated by capabilities curiosity and don’t want to admit it, whether that’s in public or even to themselves.
Thanks—any good examples spring to mind off the top of your head?
I’m not sure my desire to do interpretability comes from capabilities curiosity, but it certainly comes in part frominterpretability curiosity; I’d really like to know what the hell is going on in there...
people have repeatedly made the argument that it contributes more to capabilities on this forum, and so far it hasn’t seemed to convince that many interpretability researchers. I personally suspect this is largely because they’re motivated by capabilities curiosity and don’t want to admit it, whether that’s in public or even to themselves.
Thanks—any good examples spring to mind off the top of your head?
I’m not sure my desire to do interpretability comes from capabilities curiosity, but it certainly comes in part frominterpretability curiosity; I’d really like to know what the hell is going on in there...