Value Impact




Being on Earth when this happens is a big deal, no matter your objectives – you can’t hoard pebbles if you’re dead! People would feel the loss from anywhere in the cosmos. However, Pebblehoarders wouldn’t mind if they weren’t in harm’s way.

Appendix: Contrived Objectives

A natural definitional objection is that a few agents aren’t affected by objectively impactful events. If you think every outcome is equally good, then who cares if the meteor hits?

Obviously, our values aren’t like this, and any agent we encounter or build is unlikely to be like this (since these agents wouldn’t do much). Furthermore, these agents seem contrived in a technical sense (low measure under reasonable distributions in a reasonable formalization), as we’ll see later. That is, “most” agents aren’t like this.

From now on, assume we aren’t talking about this kind of agent.


Notes

  • Eliezer introduced Pebblesorters in the the Sequences; I made them robots here to better highlight how pointless the pebble transformation is to humans.

  • In informal parts of the sequence, I’ll often use “values”, “goals”, and “objectives” interchangeably, depending on what flows.

  • We’re going to lean quite a bit on thought experiments and otherwise speculate on mental processes. While I’ve taken the obvious step of beta-testing the sequence and randomly peppering my friends with strange questions to check their intuitions, maybe some of the conclusions only hold for people like me. I mean, some people don’t have mental imagery – who would’ve guessed? Even if so, I think we’ll be fine; the goal is for an impact measure – deducing human universals would just be a bonus.

  • Objective impact is objective with respect to the agent’s values – it is not the case that an objective impact affects you anywhere and anywhen in the universe! If someone finds $100, that matters for agents at that point in space and time (no matter their goals), but it doesn’t mean that everyone in the universe is objectively impacted by one person finding some cash!

  • If you think about it, the phenomenon of objective impact is surprising. See, in AI alignment, we’re used to no-free-lunch this, no-universal-argument that; the possibility of something objectively important to agents hints that our perspective has been incomplete. It hints that maybe this “impact” thing underlies a key facet of what it means to interact with the world. It hints that even if we saw specific instances of this before, we didn’t know we were looking at, and we didn’t stop to ask.