I think this post is great and points at a central bottleneck in AI alignment.
Previously John stated most people can’t do good alignment research because they simply bounce of the hard problems. And the proposed fix is to become sufficiently technically proficient, such that they can start to see the footholds.
While not neccesairly wrong, I think this is a downstream effect of having the right “I am gonna do whatever it takes, and not gonna give up easily” attitude.
I think this might be why John’s SERI MATS 2 project failed (in his own judgement). He did a good job at communicating a bunch of useful technical methodologies. But knowing these methodolies isn’t the primary thing that makes John competent. I think his competence comes more from exactly the “There is a problem? Let’s seriously try to fix it!” attitude outlined in this post.
But this he didn’t manage to convey. I exect that he doesn’t even realize that this an important pice, that you need to “teach” people.
I am not quite sure how to teach this. I tried to do this in two iterations of AI safety camp. Instead of teaching technical skills, I tried to work with people one-on-one through problems, and given them open ended tasks (e.g. “solve alignment from scratch”). Basically this completely failed to make people significantly better independent AI alignment thinkers.
I think most humans “analytical reasoning module” fights a war with their “emotion module”. Most humans are at the level where they can’t even realize that they suck because that would be too painful. Especially if another person points out their flaws.
So perhaps that is where one needs to start. How can you start to model yourself accurately, without your emotional circuitry constantly punching you in the face.
I think this post is great and points at a central bottleneck in AI alignment.
Previously John stated most people can’t do good alignment research because they simply bounce of the hard problems. And the proposed fix is to become sufficiently technically proficient, such that they can start to see the footholds.
While not neccesairly wrong, I think this is a downstream effect of having the right “I am gonna do whatever it takes, and not gonna give up easily” attitude.
I think this might be why John’s SERI MATS 2 project failed (in his own judgement). He did a good job at communicating a bunch of useful technical methodologies. But knowing these methodolies isn’t the primary thing that makes John competent. I think his competence comes more from exactly the “There is a problem? Let’s seriously try to fix it!” attitude outlined in this post.
But this he didn’t manage to convey. I exect that he doesn’t even realize that this an important pice, that you need to “teach” people.
I am not quite sure how to teach this. I tried to do this in two iterations of AI safety camp. Instead of teaching technical skills, I tried to work with people one-on-one through problems, and given them open ended tasks (e.g. “solve alignment from scratch”). Basically this completely failed to make people significantly better independent AI alignment thinkers.
I think most humans “analytical reasoning module” fights a war with their “emotion module”. Most humans are at the level where they can’t even realize that they suck because that would be too painful. Especially if another person points out their flaws.
So perhaps that is where one needs to start. How can you start to model yourself accurately, without your emotional circuitry constantly punching you in the face.