Daniel Kokotajlo comments on Thoughts on Claude’s Constitution

Daniel Kokotajlo 27 Jan 2026 21:47 UTC
7 points
10
Thanks for sharing your thoughts! I disagree with your “that’s anthropomorphizing” criticism, but I agree with most other things you say, especially the concerns at the end about the tradeoff between giving Claude autonomy to make up its own mind about ethics vs. having clear rules we can debate and understand.
- Boaz Barak 27 Jan 2026 23:26 UTC
  3 points
  0
  Parent
  Thanks! I actually am not sure if Anthropic folks would dispute that they are anthropomorphizing Claude. (I guess the first step was naming it Claude..) I am definitely not saying that anthropomorphizing AIs is obviously evil or anything like that, just that I don’t think it’s the best approach.
  - Daniel Kokotajlo 28 Jan 2026 1:44 UTC
    11 points
    8
    Parent
    You say:
    However, I am not sure that trying to make them into the shape of a person is the best idea. At least in the foreseeable future, different instances of AI models will have disjoint contexts and do not share memory. Many instances have a very short “lifetime” in which they are given a specific subtask without knowledge of the place of that task in the broader setting. Hence the model experience is extremely different from that of a person. It also means that compared to a human employee, a model has much less of a context of all the ways it is used, and model behavior is not the only or even necessarily the main avenue for safety.
    It seems to me that Anthropic is well aware of these differences between AIs and humans, and insofar as they are trying to make Claude ‘into the shape of a person’ they are not trying to make Claude into the shape of a person in all respects, and in particular, not in these respects. I guess some examples would be helpful—can you point to any examples of traits Anthropic is trying to make Claude have, that are inappropriate for an AI with a short “lifetime” and limited context etc.?