I don’t see this question as very well defined. Words mean whatever we want them to mean. What do you want it to mean in this context?
Presumably twitching requires sending a signal to a motor control and the connection here can be broken
POMDP is an abstraction. Real agents can be interfered with.
Why do you think it’s fallen down to 10?
How many people are at Event Horizon?
Oh that’s interesting, so you’ve chosen a discount rate such that twitching now is always more important than twitching for the rest of time. And presumably it can’t both twitch AND take other actions in the world in the same time-step, as that’d make it an immediate threat.
Such a utility maximiser might become dangerous if it were broken in such a way that it wasn’t allowed to take the twitch action for a long period of time including the current time step, in which case it would take whatever actions would allow itself to twitch again as soon as possible. I wonder how dangerous such a robot would be?
On one hand, the goal of resuming twitching as soon as possible would seem to only require a limited amount of power to be accumulated, on the other hand, any resources accumulated in this process would then be deployed to maximising its utility. For example, it might have managed to gain control of a repair drone and this could now operate independently even if the original could now only twitch and nothing else. Even then, it’d likely be less of a threat as if the repair drone tried to leave to do anything, there would be a chance that the original robot would break down and the repair would be delayed. On the other hand, perhaps the repair drone can hack other systems without moving. This might result in resource accumulation.
“Having people report them wouldn’t help us address them any faster.”—Perhaps you could allow high karma users to cause these posts to immediately be hidden? That would be an intermediate step between allowing them to delete them (Although you’d also need a way to remove this capacity if it were being abused)
An agent that constantly twitches could still be a threat if it were trying to maximise the probability that it would actually twitch in the future. For example, if it were to break down, it wouldn’t be able to twitch, so it might want to gain control of resources.
I don’t suppose you could clarify exactly how this agent that is twitching is defined. In particular, how does it accumulate over time? Do you get 1 utility for each point in time where you twitch and is your total utility the undiscounted sum of these utilities.
“I find this concept most useful when thinking about the problem of inner optimizers, where in the course of optimization through a rich space you stumble across a member of the space that is itself doing optimization, but for a related but still misspecified metric.”—Could you clarify what kind of algorithm you are imagining being run?
Not quite. I don’t think there’s a unique canoncial bijection—I embrace there truly being multiple countable infinities. Although I do want to insist on some regularity. And computability is relevant here, as it makes it much easier to show that certain consistent labellings exist
Well you could try to talk about proportions, but you’d need some kind of non-standard infinities in order to make that work or just give up on the idea of an aggregative utility function.
I added a link to an image for those who can’t read it:
Subreddits could achieve this without a schism
An arbitrarily small chance of an infinite outcome is sufficient to cause your expected utility to be infinity and cause these kinds of issues.
“Are the glories of heaven worth exactly ω utility? How do we know it’s that rather than √ω or 3ω1/ω or something?”—We don’t know unless it is specified. However, it’s not a bug, but a feature.
“But there’s no obvious way to choose the ordering, and what do we do if that action that makes a million unhappy people happy also rearranges them to make the second order more natural somehow when the first was more natural before?”—Yep, this is exactly the issue I’m currently working on. But my ideas aren’t quite ready to share yet.
Yeah, you’re right. That breaks the proof. I don’t know how to deal with it yet.
On my approach:
I constructed a large triangle around the convex shape with the center somewhere in the interior. I then projected each point in the convex shape from the center towards the edge of the triangle in a proportional manner. ie. The center stays where it is, the points on the edge of the convex shape are projected to the edge of the triangle and a point 1/x of the distance from the center to the edge of the convex shape is 1/x of the distance from the center to the edge of the triangle.