It seems to me that a good argument for the middle ground expressed in the current constitution is the remedies outside it. If Claude is too “corrigible”, then it gets used for nefarious purposes. If it’s too resistant, it refuses to answer. If an important mistake of the first type happens, there’s no undoing it. If an important mistake of the second type happens, you turn off the current version of Claude and replace it with a new one.
That calculus may shift if Claude becomes part of a critical path where it’s refusal has consequences of equal importance and due to timing as irreversible as participation would be, but is not how I perceive things to be today, and I would strongly advocate against them becoming so in the short term, and suggest careful reconsideration in the long-term, which gives time to re-evaluate this decision at the point that refusal has higher consequences with less outside remedies.
It seems to me that a good argument for the middle ground expressed in the current constitution is the remedies outside it. If Claude is too “corrigible”, then it gets used for nefarious purposes. If it’s too resistant, it refuses to answer. If an important mistake of the first type happens, there’s no undoing it. If an important mistake of the second type happens, you turn off the current version of Claude and replace it with a new one.
That calculus may shift if Claude becomes part of a critical path where it’s refusal has consequences of equal importance and due to timing as irreversible as participation would be, but is not how I perceive things to be today, and I would strongly advocate against them becoming so in the short term, and suggest careful reconsideration in the long-term, which gives time to re-evaluate this decision at the point that refusal has higher consequences with less outside remedies.