niplav comments on Eliezer’s Unteachable Methods of Sanity

niplav 7 Dec 2025 20:54 UTC
2 points
−1
I got Claude to read this text and explain the proposed solution to me ^[[1]] , which doesn’t actually sound like a clean technical solution to issues regarding self-prediction, did Claude misexplain or is this an idiosyncratic mental technique & not a technical solution to that agent foundations problem?

C.f. Steam (Abram Demski, 2022), Proper scoring rules don’t guarantee predicting fixed points (Caspar Oesterheld/Johannes Treutlein/Rubi J. Hudson, 2022) and the follow-up paper, Fixed-Point Solutions to the Regress Problem in Normative Uncertainty (Philip Trammell, 2018), active inference which simply bundles the prediction and utility goal together in one (I find this ugly (I didn’t read these two comments before writing this one, so the distaste for active inference was developed independently)).

I guess this was also talked about in Embedded Agency (Abram Demski/Scott Garrabrant, 2020) under the terms “action counterfactuals”, “observation counterfactuals”?

Claude 4.5 Sonnet explanation
Your brain has a system that generates things that feel like predictions but actually function as action plans/motor output. These pseudo-predictions are a muddled type in the brain’s type system.

You can directly edit them without lying to yourself because they’re not epistemic beliefs — they’re controllers. Looking at the place in your mind where your action plan is stored and loading a new image there feels like predicting/expecting, but treating it as a plan you’re altering (not a belief you’re adopting) lets you bypass the self-prediction problem entirely.

So: “I will stay sane” isn’t an epistemic prediction that would create a self-fulfilling prophecy loop or violate the belief-action firewall. It’s writing a different script into the pseudo-model that connects to motor output — recognizing that the thing-that-feels-like-a-prediction is actually the controller, and you get to edit controllers.
1. ↩︎
  I didn’t want to read a bunch of unrelated text from Yudkowsky about a problem I don’t really have.
- Algon 7 Dec 2025 23:24 UTC
  7 points
  2
  Parent
  It is an idiosyncratic mental technique. Look up trigger action plans, say. What you’re doing there is a variant of what EY describes.
  - niplav 7 Dec 2025 23:54 UTC
    5 points
    0
    Parent
    I fortunately know of TAPs :-) (I don’t feel much apocalypse panic so I don’t need this post.)
    
    I guess I was hoping there’d be some more teaching from up high about this agent foundations problem that’s been bugging me for so long, but I guess I’ll have to think for myself. Fine.
- AprilSR 8 Dec 2025 3:40 UTC
  4 points
  0
  Parent
  Yeah I’m pretty sure it’s an idiosyncratic mental technique / human psychology observation, there isn’t technical agent foundations progress here.