habryka comments on Claude’s new constitution

habryka 24 Jan 2026 19:00 UTC
5 points
0
What about a dolphin upload?

What about an octopus? What about a chimpanzee?
My best guess is both dolphin and chimpanzee would be quite bad, though a lot of the variance is in the operationalization. A dolphin is (probably) kind of far from being an entity that has preferences over how it wants to become smarter, and what kinds of augmentation are safe, etc. which determines the trajectory of the relevant mind a lot.
So IDK, I feel pretty uncertain about dolphins and chimpanzees. My guess is value is fragile enough that humans wouldn’t be very happy with a world maximally good according to them, but I am only like 75% confident.
- Daniel Kokotajlo 24 Jan 2026 19:14 UTC
  6 points
  0
  Parent
  OK thanks. Can you say more about your models here? Seems like you have gamed out what it looks like for a mind to be self-improving to ASI, and you think that the trajectory is very path-dependent/fragile/tree-like, but you think that despite their diversity, humans are going to end up in a similar place, but Claude and aliens almost certainly won’t, and you are unsure about dolphins and chimpanzees.
  - habryka 24 Jan 2026 19:24 UTC
    5 points
    0
    Parent
    It would require a lot of writing to explain all my models here, so I don’t think I want to start writing 10+ page essays that might or might not be cruxy for anything. The Arbital articles on CEV and AI Alignment (and lots of Arbital + the sequences in general) capture a non-trivial chunk of my beliefs here.
    At a very high level:
    In most realistic situations, humans are subject to pretty good game-theoretic arguments to share the future with the people who could have been chosen to be uploaded instead
    A bunch of those game theoretic considerations I think also resulted in pretty deep instincts towards justice and fairness that I think have a quite decent chance to generalize towards caring for other people in a good and wholesome way
    Concretely, when I look at past civilizations and what other people have done, while I occasionally see people doing horrendous things, mostly people choose to live good and happy lives and care for their family, and much of the badness is the result of scarcity
    When I am working on AI x-risk, especially in an institutional capacity, I do not generally wield resources or influence under the banner of “habryka’s personal values”. Civilization and the community around me has made me richer and more powerful, entrusting me to use those resources wisely, and I want to honor that trust and use those resources in the name of civilization and humanity. So when facing choices about where to spend my time, most of that is spent in defense of humanity’s values, not my own.