TsviBT comments on “We are confused about agency”

TsviBT 19 Feb 2026 2:02 UTC
6 points
0
“What determines a mind’s effects” does not seem likely to have a clean answer in general.

I can quibble with this, and of course it’s an open question (e.g. cf. https://www.lesswrong.com/posts/NvwjExA7FcPDoo3L7/are-there-cognitive-realms). But I certainly take your point that you maybe don’t need to understand all or most minds. (I may have written something contradicting that in the “fundamental question” post; if I did, then I was mistaken or wrote unclearly or something.)

I am not sure that there is a finite set of concepts for understanding agency. The problem with a theory of agents is that agents invent theory, so agent theory keeps pulling more stuff into itself.

I think I agree with this. Cf. https://www.lesswrong.com/posts/nkeYxjdrWBJvwbnTr/an-advent-of-thought

I guess that the point of agent foundations is to focus on only the core theory of agency. But what characterizes the core theory?

I would say that alignment is pretty close to the core theory; let’s say, ¹⁄₄ of core theory is alignment, and ¹⁄₂ of alignment is core theory (numbers made up, but just to give relative sizes or something). Or IDK haha. But what I mean is, core theory would tend to be
- Elements that an individual mind converges to over time. (Yes, there are many elements X that may not converge, e.g. because the mind is permanently deranged, or because X is entangled with ongoing / infinite creativity; and in some sense all X are entangled with ongoing creativity, and therefore don’t fully converge in terms of their full meaning for the mind… but like, still, there’s obvious senses in which things do converge.) (Some of these elements are mind-general things like Bayesian reasoning; others are mind-specific / value-laden / cognitive-realm-specific. The latter are converged to by choice (choice of what to be).)
- Elements that determine the mind’s ultimate effects.
- Elements that are stable under reflective self-modification. Cf. https://www.lesswrong.com/posts/Ht4JZtxngKwuQ7cDC?commentId=koeti9ygXB9wPLnnF
If something tends to get self-modified away, it’s probably not core.

You might say that nothing is stable, everything gets self-modified away? We could debate that. But I have a perhaps not very accountable sense that it is possible for me to decide some things about myself permanently, even though I’m fairly gung-ho (in the long run) about radically changing / growing / transcending myself. E.g. I think I can decide to never pointlessly torture a person—I mean presumably someone could fairly feasibly mess with my head a bunch to change that property, but I mean that left to my own RSI devices I would never change my mind on that. Do you agree that some things can be stable like that?