Consequentialism is in the Stars not Ourselves

DragonGod24 Apr 2023 0:02 UTC

7 points

AI World Modeling Consequentialism Optimization Agent Foundations Agency Mesa-Optimization Rationality

Polished from my shortform.

Epistemic Status

Thinking out loud.

Introduction

I’ve argued that system wide/total optimisation for an objective function in the real world is so computationally intractable as to be prohibited by the laws of physics of our universe^[1]. Yet it’s clearly the case that e.g., evolution is optimising for inclusive genetic fitness (or perhaps patterns that more successfully propagate themselves if you’re taking a broader view) in such a totalising manner. I think examining why evolution is able to successfully totally optimise for its objective function would be enlightening.

Using the learned optimisation ontology, we have an outer selection process (evolution, stochastic gradient descent, etc.) that selects intelligent systems according to their performance on a given metric (inclusive genetic fitness and loss respectively).

Optimisation

Behavioural (Descriptive) Optimisation

I think of behavioural optimisation as something along the general lines of:

Navigating through a state space to improbable regions that are extremal values of some compactly specifiable (non-trivial) objective function^[2].

Mechanistic (Prescriptive) Optimisation

I think of mechanistic optimisation as something along the general lines of:

a procedure that internally searches through an appropriate space for elements that maximise or minimise the value of some objective function defined on that space].

“Direct” optimisation in the ontology introduced by @beren.

Notably, the procedure must actually evaluate^[3] the objective function (or the expected value thereof) on elements of the search space.

Mechanistic optimisation is implementing an optimisation algorithm.

For the rest of this post — unless otherwise stated — I’ll be using “optimisation”/”optimising” to refer to “mechanistic optimisation”.

“Scope” of Optimisation^[4]

I want to distinguish optimising systems according to the “scope” of the optimisation procedure(s) in the system’s policy^[5].

“Partial” (Task Specific) Optimisation

Involves deploying optimisation (search, planning, etc.) to accomplish specific tasks (e.g., making a good next move in chess, winning a chess game, planning a trip, solving a puzzle).
The choice of particular tasks is not determined as part of this framework; tasks could be subproblems of another optimisation problem (e.g., picking a good next move as part of winning a chess game), generated via heuristics, etc.
The system’s policy is not a coherent optimiser, but contains optimisation subprocedures that are applied to specific tasks the system encounters
- Optimisation is but one tool in the system’s toolbox

“Total” Optimisation

Entails consistently employing optimisation throughout a system’s active lifetime to achieve fixed terminal goals.
All actions/outputs flow from their expected consequences on realising the terminal goals (e.g., if a terminal goal is to maximise the number of lives saved, every activity—eating, sleeping, playing, working—is performed because it is the most tractable way to maximise the expected number of future lives saved at that point in time).
The system’s entire policy is in effect an optimisation algorithm with a set of objectives it is coherently optimising for.

Outer Optimisation Processes as Total Optimisers

As best as I can tell, there are some distinctive features of outer optimisation processes that facilitate total optimisation:

Access to more compute power

ML algorithms are trained with significantly (often orders of magnitude) more compute than is used for running inference due in part to economic incentives
- Centralisation of ML training allows training ML models on bespoke hardware in massive data centres, but the models need to be cheap enough to run profitably
  - Optimising inference costs has led to “overtraining” (per the Chinchilla scaling laws) smaller models (e.g. LLaMA)
- In some cases, trained models are intended to be run on consumer hardware or edge computing devices, so there is a many orders of magnitude gap between the computing available for inference and the computing available for training
Evolutionary processes have access to the cumulative compute power of the entire population under selection, and they play out across many generations of the population
This (much) greater compute allows outer optimisation processes to apply (many?) more bits of selection towards their objective functions

Relaxation of time constraints

Real-time inference imposes a strict bound on how much computation can be performed in a single time step
- Robotics, self-driving cars, game AIs, etc. must make actions within fractions of a second
  - Sometimes hundreds of actions in a second
- User facing cognitive models (e.g.) LLMs are also subject to latency constraints
  - Though people may be more willing to wait longer for responses if the output of the models are sufficiently better
In contrast, the outer selection process just has a lot more time to perform optimisation
- ML training runs already last several months, and the only bound on length of training runs seems to be hardware obsolescence
  - For sufficiently long training runs, it becomes better to wait for the next hardware generation before starting training
  - Training runs exceeding a year seem possible eventually, especially if loss keeps going down with scale
- Evolution occurs over timescales of hundreds to thousands of generations of an organism

Solving a (much) simpler optimisation problem

Outer optimisation processes evaluate the objective function by using actual consequences along particular state-action trajectories for selection, as opposed to modeling expected consequences across multiple future trajectories and searching for trajectories with better expected consequences.
- Evaluating future consequences of actions is difficult
  - E.g., what is the expected value of writing this LessWrong shortform on the number of future lives saved?
  - Alternatively, if humans were totally optimising for inclusive genetic fitness, children would need to model the consequences of stubbing their toes on their future reproductive viability and avoid it as a result, vs avoiding stubbing their toes because its painful.
Chaos sharply limits how far into the future we can meaningfully predict (regardless of how much computational resources one has), which is not an issue when using actual consequences for selection
- In a sense, outer optimisation processes get the “evaluate consequences of this trajectory on the objective” for free, and that’s just a very difficult (and in some cases outright intractable) computational problem
It’s hard to effectively search across future trajectories for the best next action if you can’t readily/accurately model the consequences of said trajectories
Or are sharply limited in the length of trajectories you can consider
The usage of actual consequences applies over longer time horizons
- Evolution has a potentially indefinite/unbounded horizon
  - And has been optimising for much longer than the lifespan of any organism
- Current ML training generally operates with fixed-length horizons but uses actual/exact consequences of trajectories over said horizons.
Outer optimisation processes select for a policy that performs well according to the objective function on the training distribution, rather than selecting actions that optimise an objective function directly in deployment.
- This approach amortises the cost of optimisation across many future inferences of the selected policy
Access to vastly more data further facilitates learning a suitable policy
- Evolution for example has access to many orders of magnitude more data points with which to select a suitable policy than any given organism observes in its lifetime
- Current ML models mostly learn in a largely offline fashion, but to the extent that in context learning for LLMs is analogous to within lifetime learning for animals, LLMs were pretrained with again many orders of magnitude more tokens than they see in a given context window (trillions of tokens for pretraining vs a few thousand tokens for the context window)

Summary

Outer optimisation processes are more capable of total optimisation due to their access to more compute power, relaxed time constraints, and just generally facing a much simpler optimisation problem (evaluations of exact consequences are provided for free [and over longer time horizons], amortisation of optimisation costs, etc).

These factors enable outer optimisation processes to totally optimise for an objective function (their selection metric) in a way that is infeasible for the intelligent systems they select for.

Tentative Conclusions

I’m updating towards “powerful optimisation process” being the wrong way to think about intelligent systems.

While it is true that intelligent systems do some direct optimisation as part of their cognition, reasoning about them as purely direct optimisers seems like it would lead to (importantly) wrong inferences about their out of distribution/generalisation behaviour and what the “converge” to as they are amplified or subject to further selection pressure.

Most human cognition isn’t mechanistically consequentialist, and that isn’t coincidence or happenstance; mechanistic consequentialism is just an incredibly expensive way to do inference, and the time/compute constraints prohibit it in most circumstances. In particular, it’s just much easier to delegate the work of evaluating long term consequences of trajectories to the outer selection process (e.g. stubbed toes lower future reproductive potential), which can select contextual heuristics that are performant in the environment a system was adapted to (e.g. avoid things that are painful).

A lot of the “work” done^[6] by (mechanistic) consequentialism happens in the outer selection process that produced a system, not in the system so selected.

~~And it can’t really be any other way~~^[7].

Cc: @beren, @tailcalled, @Chris_Leong, @JustisMills.

^
Note that total optimisation in simple environments (e.g. tic-tac-toe, chess, go) is more computationally tractable (albeit to varying degrees)
^
For a compactly specifiable nontrivial objective function.
“Compactly specifiable”: any system is behaviourally optimising for the utility function that assigns positive utility to whatever action the system takes at each time step and negative utility to every other action.
“Nontrivial”: likewise any system is behaviourally optimising for the compactly specifiable objective function that assigns equal utility to every state.
Motivation: if your definition of (behavioural) optimisation considers a rock an optimising system, then it’s useless/unhelpful.
Speculation: all (behavioural) optimisers are either mechanistic optimisers (including partial optimisers) or otherwise products of an optimisation process (e.g. tabular Q-learning).
^
Or approximate, though I’m not sure whether I prefer to consider evaluating approximations of a particular objective function as mechanistically optimising for the approximator instead of the “true” function.
~~Suggestions welcome!~~
Model free RL algorithms that determine their policy by argmaxing over actions wrt their value function seem best considered as mechanistically optimising their internal value function and not the return (discounted cumulative future reward) even though the learned value function is an approximation of the return.
Reasoning about a system as optimising for the return seems liable to lead to wrong inferences about its out of distribution/generalisation behaviour (e.g. hypothesising that it will wirehead because wireheading attains maximal return in the new environment).
^
Suggestions for alternative names/terminology welcome.
^
In an RL setting, the system’s policy is a mapping from agent state to (a probability distribution over) actions. As far as I’m aware, any learning task can be recast in an RL setting.
But you could consider a generalised policy a mapping from “inputs” (e.g. “the prompt” for an autoregressive language model) to a probability distribution over outputs.
^
Especially re: modelling long term consequences of trajectories and selecting performant heuristics for the performance on the selection metric in a given environment.
^
This is less true of humans given that we’ve moved out of our environment of evolutionary adaptedness.
But in a sense, the cultural/memetic selection happening on our species/civilisation is still itself an outer selection process.
Regardless, I’m not convinced the statement is sufficiently true of humans/our civilisation to try and generalise it to arbitrary intelligent systems.