(Crossposted from my blog)

In 2015 MIRI proposed quantilizers as a decision theory criterion that an AI may use that allows it strive towards a goal while not optimizing too hard at the goal, so that if a superintelligent were a quantilizer there’s a smaller chance it will do something catastrophic like kill all humans and turn all matter on Earth into paperclips because it’s trying really hard to maximize the productivity of a paperclip factory. For a real number $0 \leq p \leq 1$ , a $p$ -quantilizer is an agent which, given a goal and a space of possible strategies, picks randomly out of the top proportion $p$ , i.e., randomly among any strategy that is better than a proportion $(1 - p)$ of all strategies. In fact, a quantilizer is for all practical purposes equivalent to a a perfect optimizer (an agent that always picks the best of all its possible strategies) that is further restricted to have its strategy specifies by a string of bits whose length must be less than ${log}_{2} (1 / p)$ . Since the people at MIRI have already considered the possibility of controlling an AI by restricting its ability to perform input and output and have generally decided that they are not satisfied with this, I consider the idea of quantilizers to be a failure.

Roughly speaking, my claim is that for a natural number $n$ , up to a constant additive factor, a perfect optimizer that is limited to strategies taking at most $n$ bits performs as well in any goal as a $2^{- n}$ -quantilizer. More precisely, to explain what I mean by “up to a constant additive factor”, there is a reasonably-sized constant $c$ such that an optimizer with $n + c$ bits of outputs performs as well as a $2^{- n}$ -quantilizer, and a $2^{- (n + c)}$ -quantilizer performs as well as an optimizer with $n$ bits of output. I also need to assume that the space of strategies is given a rich enough encoding that we can do things like specify a strategy by specifying a computer program whose output is that strategy, or that the environment is rich enough that this can be done implicitly with strategies like “Walk to a computer and type this program, and attach to the computer the accessories necessary to run the strategy specified by the program.” To really make the result true up to a constant factor and not a logarithmic factor or anything like that you need to careful about encoding using stuff like prefix-free encodings but I don’t think that’s really important and will gloss over it.

Quantilizers are at least as good as bounded-output optimizers: If a particular strategy is optimal among all strategies of length $\leq n$ bits, then since there are at most $2^{n + 1}$ such strategies this one optimizing strategy already makes up a proportion $2^{- (n + 1)}$ of all strategies with $\leq n$ bits, so a $2^{- (n + 1)}$ -quantilizer over strategies with $\leq n$ bits must use exactly this strategy or an equally-performing strategy. If we assume the strategies have a measure biased towards short strategies or use a prefix-free encoding or a subset of the strategies are implicitly something like that, a proportion $\geq \frac{1}{c}$ of all strategies will have length $\leq n$ , so a proportion $\geq 2^{- (n + c)}$ will be identical to this $\leq n$ -bit optimal strategy, so a $2^{- (n + c)}$ -quantilizer will perform at least as well as this strategy.

Bounded-output optimizers are at least as good as quantilizers: Using a deterministic pseudo-random number generator, it is possible to precisely specify a sequence $s_{0}, s_{1}, s_{2}, \dots$ of strategies that are fairly sampled from the space of all strategies. A proportion $2^{- n}$ of these strategies are also in the top $2^{- n}$ of performance among all strategies. So the smallest $m$ such that $s_{m}$ is among the top $2^{- n}$ in performance is statistically certain to have $m \leq C 2^{n}$ for a small constant $C$ . It takes $n + O (1)$ bits to specify the $m$ , plus an additional $O (1)$ bits to specify the pseudo-random sequence and the command to perform strategy $s_{m}$ given $m$ . So an optimizer with $n + O (1)$ bits of output can specify the strategy $s_{m}$ which is $2^{- n}$ -quantilizing, so the actual optimizing strategy is at least as good.

Quantilizer ≡ Optimizer with a Bounded Amount of Output

itaibn016 Nov 2021 1:03 UTC

LW: 11 AF: 5

4 comments2 min readLW link

Quantilization AI

What links here?

Could we set a resolution/stopper for the upper bound of the utility function of an AI? by FinalFormal2 (11 Apr 2022 3:10 UTC; -5 points)

Donald Hobson 16 Nov 2021 10:45 UTC
15 points
0
You can quantize any distribution. For the random distribution, you need to use 0.000… 01% quantilization to get anything useful at all. (In non-trivial environments, the vast majority of random actions are useless junk. If you have a humanoid coffee making robot, almost all random motor inputs will result in twitching on the floor.)
However, you can also quantalize over a model trained to imitate humans. Suppose you give some people a joystick and ask them to remote control the robot to make coffee. They manage this about half the time. You train the robot to be a 25% quantalizer. The top 25% of human actions will do better than humans.
This is technically equivalent to an imitator of humans, and an ASI that can only output 2 bits (which are used as a random seed for the human)
Though these descriptions imply different internal workings, which may indicate different probabilities of the AI using rowhammer.
- Charlie Steiner 16 Nov 2021 11:28 UTC
  4 points
  0
  Yeah, I think this is the key point. Quantilizers are only safe or smart relative to some base distribution.
TurnTrout 16 Nov 2021 1:34 UTC
3 points
0
You mention:
If we assume the strategies have a measure biased towards short strategies or use a prefix-free encoding or a subset of the strategies are implicitly something like that
Can you discuss more your assumptions on the quantilizer’s base distribution? I take it that it’s uniform?
- itaibn0 16 Nov 2021 1:50 UTC
  1 point
  0
  Really all I need is that a strategy that takes n bits to specify will be performed by 1 in $\sim 2^{n}$ of all random strategies. Maybe a random strategy consists of a bunch of random motions that cancel each other out, and in 1 in $\sim 2^{n}$ of strategies in between these random motions are directed actions that add up to performing this n-bit strategy. Maybe 1 in $\sim 2^{n}$ strategies start off by typing this strategy to another computer and end with shutting yourself off, so that in the remaining bits of the strategy will be ignored. A prefix-free encoding is basically like the latter situation except ignoring the bits after a certain point is built into the encoding rather than being an outcome of the agent’s interaction with the environment.