We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system.
I appreciate the difficulty of actually defining optimizers, and so don’t want to quibble with this definition, but am interested in whether you think humans are a central example of optimizers under this definition, and if so whether you think that most mesa-optimizers will “explicitly represent” their objective functions to a similar degree that humans do.
Agreed that this points in the right direction. I think there’s more to it than that though. Consider for example a three-body problem under Newtonian mechanics. Then there’s a sense in which specifying the initial masses and velocities of the bodies, along with Newton’s laws of motion, is the best way to compress the information about these chaotic trajectories.
But there’s still an open question here, which is why are three-body systems chaotic? Two-body systems aren’t. What makes the difference? Finding an explanation probably doesn’t allow you to compress any data any more, but it still seems important and interesting.
(This seems related to a potential modification of your data compression standard: that good explanations compress data in a way that minimises not just storage space, but also the computation required to unpack the data. I’m a little confused about this though.)
Thanks for the kind words. I agree that refactoring would be useful, but don’t have the time now. I have added some headings though.
A relevant book recommendation: The Enigma of Reason argues that thinking of high-level human reasoning as a tool for attacking other people’s beliefs and defending our own (regardless of their actual veracity) helps explain a lot of weird asymmetries in cognitive biases we’re susceptible to, including this one.
I’d like to push back on the assumption that AIs will have explicit utility functions. Even if you think that sufficiently advanced AIs will behave in a utility-maximising way, their utility functions may be encoded in a way that’s difficult to formalise (e.g. somewhere within a neural network).
It may also be the case that coordination is much harder for AIs than for humans. For example, humans are constrained by having bodies, which makes it easier to punish defection—hiding from the government is tricky! Our bodies also make anonymity much harder. Whereas if you’re a piece of code which can copy itself anywhere in the world, reneging on agreements may become relatively easy. Another reasion why AI cooperation might be harder is simply that AIs will be capable of a much wider range of goals and cognitive processes than humans, and so they may be less predictable to each other and/or have less common ground with each other.
This paper by Critch is relevant: it argues that agents with different beliefs will bet their future share of a merged utility function, such that it skews towards whoever’s predictions are more correct.
Which policies in particular?
This point seems absolutely crucial; and I really appreciate the cited evidence.
Actually, general relativity seems to have been discovered by Hilbert at almost exactly the same time that Einstein did.
Biggest jump forward.
Does anyone know how this paper relates to Paul Christiano’s blog post titled Handling destructive technology, which seems to preempt some of the key ideas? It’s not directly acknowledged in the paper.
Interesting list. How would you compare reading the best modern summaries and analyses of the older texts, versus reading them in the original?
Quigley’s career demonstrates an excellent piece of sociological methodology… He builds a theory that emphasizes the importance of elites, and subsequently goes and talks to members of the elite to test and apply the theory.
I’m not sure if this is meant to be ironic, but that methodology seems like an excellent way to introduce confirmation bias. I guess it’s excellent compared to not going and talking to anyone at all?
Depends what type of research. If you’re doing experimental cell biology, it’s less likely that your research will be ruined by abstract philosophical assumptions which can’t be overcome by looking at the data.
So when is rationality relevant? Always! It’s literally the science of how to make your life better / achieving your values.
Sometimes science isn’t helpful or useful. The science of how music works may be totally irrelevant to actual musicians.
If you think of instrumental rationality of the science of how to win, then necessarily it entails considering things like how to setup your environment, unthinking habits, how to “hack” into your psyche/emotions.
It’s an empirical question when and whether these things are very useful; my post gives cases in which they are, and in which they aren’t.
Some effort spent in determining which things are good, and in which things lead to more opportunity for good is going to be rewarded (statistically) with better outcomes.
All else equal, do you think a rationalist mathematician will become more successful in their field than a non-rationalist mathematician? My guess is that if they spent the (fairly significant) time taken to learn and do rationalist things on just learning more maths, they’d do better.
(Here I’m ignoring the possibility that learning rationality makes them decide to leave the field).
I’ll also wave at your wave at the recursion problem: “when is rationality useful” is a fundamentally rationalist question both in the sense of being philosophical, and in the sense that answering it is probably not very useful for actually improving your work in most fields.
When I talk about doing useful work, I mean something much more substantial than what you outline above. Obviously 15 minutes every day thinking about your problems is helpful, but the people at the leading edges of most fields spend all day thinking about their problems.
Perhaps doing this ritual makes you think about the problem in a more meta way. If so, there’s an empirical question about how much being meta can spark clever solutions. Here I have an intuition that it can, but when I look at any particular subfield that intuition becomes much weaker. How much could a leading mathematician gain by being more meta, for example?