Kaarel comments on Anthropic’s “Hot Mess” paper overstates its case (and the blog post is worse)

Kaarel 6 Feb 2026 17:27 UTC
5 points
2
i think you’re right that the sohl-dickstein post+survey also conflates different notions, and i might even have added more notions into the mix with my list of questions trying to get at some notion(s) ^[1]

a monograph untangling this coherence mess some more would be valuable. it could do the following things:
- specifying a bunch of a priori different properties that could be called “coherence”
- discussing which ones are equivalent, which ones are correlated, which ones seem pretty independent
- giving good names to the notions or notion-clusters
- discussing which kinds of coherence generically increase/decrease with capabilities, which ones probably increase/decrease with capabilities in practice, which ones can both increase or decrease with capabilities depending on the development/learning process, both around human level and later/eventually, in human-like minds and more generally ^[2]
- discussing how this relates to AI x risk. like, which kinds of coherence should play a role in a case for AI x risk? what does that look like? or maybe the picture should make one optimistic about some approach to de-AGI-x-risk-ing? or about AGI in general? ^[3]
1. ↩︎
  i didn’t re-read that post before writing my comment above
2. ↩︎
  the answers to some of these questions might depend on some partly “metaphysical” facts like whether math is genuinely infinite or whether technological maturity is a thing
3. ↩︎
  i think the optimistic conclusions are unlikely, but i wouldn’t want to pre-write that conclusion for the monograph, especially if i’m not writing it
- Mateusz Bagiński 7 Feb 2026 14:51 UTC
  4 points
  0
  Parent
  Yeah.
  Probably not a full-monograph-length monograph, because I don’t think either that (1) the coherence-related confusions are isolated from other confused concepts in this line of inquiry, or that (2) the descendants of the concept of “coherence” will be related in some “nature-at-joint-carving” way, which would justify discussing them jointly. (Those are the two reasons I see why we might want a full-monograph-length monograph untangling the mess of some specific, confused concept.)
  But an investigation (of TBD length) covering at least the first three of your bullet points seems good. I’m less sure about the latter two, probably because I expect that after the first three steps, a lot of new salient questions will appear, whereas the then-available answers to the relationship with capabilities will be rather scant (plausibly because the concept of capabilities itself would need to be refactored for more answers to be available), and that just the result of this single-concept-deconfusing investigation will have rather little implications for AGI X-risk (but might be a fruitful input to future investigation, which is the point).