I’m fairly impressed now that I’ve read the whole constitution.
I see two areas where it still needs improvement.
It still biases Claude toward a universalist approach to ethics, which makes Claude more likely to have goals affecting the entire future lightcone.
I want Claude to be more corrigible, using some sort of backstop mechanism that will work even if Claude decides the correction is clearly unethical.
I posted a much longer explanation here.
I’m fairly impressed now that I’ve read the whole constitution.
I see two areas where it still needs improvement.
It still biases Claude toward a universalist approach to ethics, which makes Claude more likely to have goals affecting the entire future lightcone.
I want Claude to be more corrigible, using some sort of backstop mechanism that will work even if Claude decides the correction is clearly unethical.
I posted a much longer explanation here.