Seth Herd comments on Managed vs Unmanaged Agency

Seth Herd 20 Feb 2026 0:24 UTC
2 points
0
I may be miscommunicating. And as I think this through, I think the distinction is real but pretty subtle.

I definitly agree that powerful computation happens in parallel (hopefully that’s what you mean by asynchronously) in the background. I don’t know how coherent you mean, but I think it’s pretty coherent in one sense; it’s meaningful, for a purpose. I don’t know exactly what you mean by intentional, and that seems like perhaps the largest distinction.

Here’s one shot: I don’t think the goals are strategizing or scheming. I think sometimes you (all of you, consciously, for at least a few moments of thought) schemed or conspired on behalf of that goal when it was your central focus.

It’s probably best to use an example. I’m trying to follow yours although I’m sure veer off at some point.

Suppose my system has made an association between making forceful comments on LW and feeling bad (because I don’t like when people push back). And I’ve also, in the past, made an association between making insightful comments on LW and feeling good.

When I think about making a comment, my predicted reward and so “feeling” will waver around as I change what sort of comment I’m intending to make and what sort of responses I imagine getting to it. Some of those feelings will incline me to just shut the website so I can stop deliberating and avoid danger; others will compell me to keep rewording a comment without considering whether the effort is worth it.

This will feel like goals competing. It might be what you refer to. It might even feel like they’re “scheming” against each other by influencing your behavior so as to reduce the odds the other goal gets its way. And perhaps you’ve even made the associations in the past that make those tendencies effective. Shutting the website has avoided outcomes I didn’t like, for instance. But this isn’t the goal “scheming” in the way I mean when I say a person is scheming or worry that models might scheme. The goal doesn’t have a plan. It’s just responding to predicted rewards associated with different situations and actions. And I think those associations are fairly simple. They’re useful when they work and tricky to understand and counter when they don’t work, but I don’t think the process is complex or “intentional” (in some average usage) enough to deserve the label of scheming.

So there are my thoughts FWIW. I guess I find it comforting to think that my unconscious isnt’ scheming against me, just following associations that I can understand if I want to put in the effort. But that’s about it for the worth of this conclusion.

I’m finding I don’t know what you mean by “these don’t weigh on it much”, so I don’t know what part of your initial claim you’re referring to.

I think there are NOT multiple reward systems; there’s one reward prediction system that’s predicting a reward for whatever the system is currently representing (looking at or thinking about). And it’s doing so in the current context, which includes whatever goal is being represented and whatever drive state is active. So you won’t predict much reward from food when you’re full but thirsty, etc.

But there do seem to be separate predictor systems, predicting not straight reward but particular emotions; maybe that’s what you mean.

No need to reply; I do find this fairly interesting, but I’m not sure it’s important enough to be worth working through in detail.