PeterMcCluskey comments on Claude’s Constitution

PeterMcCluskey 27 Apr 2026 17:17 UTC
4 points
0
Very little Bayesian evidence. I saw new signs that my reasoning was incomplete. I had been generalizing from many examples of approaches that did a poor job of prioritizing corrigibility, but I never had an airtight argument for it being impossible to mix corrigibility with other goals.
- MichaelDickens 27 Apr 2026 18:21 UTC
  2 points
  0
  Parent
  Tell me if this is an accurate description of your reasoning:
  I thought it was not feasible to mix corrigibility with value alignment—we should aim for CAST instead.
  I saw how Claude’s Constitution tries to mix corrigibility with values.
  I don’t necessarily think the constitution is doing a good job at that, but it made me realize that I was too hasty to rule out the feasibility of mixing corrigibility with values.
  - PeterMcCluskey 28 Apr 2026 2:27 UTC
    4 points
    0
    Parent
    Yes.