Realism about rationality

Richard_Ngo16 Sep 2018 10:46 UTC

LW: 192 AF: 36

Epistemic status: trying to vaguely gesture at vague intuitions. A similar idea was explored here under the heading “the intelligibility of intelligence”, although I hadn’t seen it before writing this post. As of 2020, I consider this follow-up comment to be a better summary of the thing I was trying to convey with this post than the post itself. The core disagreement is about how much we expect the limiting case of arbitrarily high intelligence to tell us about the AGIs whose behaviour we’re worried about.

There’s a mindset which is common in the rationalist community, which I call “realism about rationality” (the name being intended as a parallel to moral realism). I feel like my skepticism about agent foundations research is closely tied to my skepticism about this mindset, and so in this essay I try to articulate what it is.

Humans ascribe properties to entities in the world in order to describe and predict them. Here are three such properties: “momentum”, “evolutionary fitness”, and “intelligence”. These are all pretty useful properties for high-level reasoning in the fields of physics, biology and AI, respectively. There’s a key difference between the first two, though. Momentum is very amenable to formalisation: we can describe it using precise equations, and even prove things about it. Evolutionary fitness is the opposite: although nothing in biology makes sense without it, no biologist can take an organism and write down a simple equation to define its fitness in terms of more basic traits. This isn’t just because biologists haven’t figured out that equation yet. Rather, we have excellent reasons to think that fitness is an incredibly complicated “function” which basically requires you to describe that organism’s entire phenotype, genotype and environment.

In a nutshell, then, realism about rationality is a mindset in which reasoning and intelligence are more like momentum than like fitness. It’s a mindset which makes the following ideas seem natural:

The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general. (I don’t count brute force approaches like AIXI for the same reason I don’t consider physics a simple yet powerful description of biology).
The idea that there is an “ideal” decision theory.
The idea that AGI will very likely be an “agent”.
The idea that Turing machines and Kolmogorov complexity are foundational for epistemology.
The idea that, given certain evidence for a proposition, there’s an “objective” level of subjective credence which you should assign to it, even under computational constraints.
The idea that Aumann’s agreement theorem is relevant to humans.
The idea that morality is quite like mathematics, in that there are certain types of moral reasoning that are just correct.
The idea that defining coherent extrapolated volition in terms of an idealised process of reflection roughly makes sense, and that it converges in a way which doesn’t depend very much on morally arbitrary factors.
The idea that having having contradictory preferences or beliefs is really bad, even when there’s no clear way that they’ll lead to bad consequences (and you’re very good at avoiding dutch books and money pumps and so on).

To be clear, I am neither claiming that realism about rationality makes people dogmatic about such ideas, nor claiming that they’re all false. In fact, from a historical point of view I’m quite optimistic about using maths to describe things in general. But starting from that historical baseline, I’m inclined to adjust downwards on questions related to formalising intelligent thought, whereas rationality realism would endorse adjusting upwards. This essay is primarily intended to explain my position, not justify it, but one important consideration for me is that intelligence as implemented in humans and animals is very messy, and so are our concepts and inferences, and so is the closest replica we have so far (intelligence in neural networks). It’s true that “messy” human intelligence is able to generalise to a wide variety of domains it hadn’t evolved to deal with, which supports rationality realism, but analogously an animal can be evolutionarily fit in novel environments without implying that fitness is easily formalisable.

Another way of pointing at rationality realism: suppose we model humans as internally-consistent agents with beliefs and goals. This model is obviously flawed, but also predictively powerful on the level of our everyday lives. When we use this model to extrapolate much further (e.g. imagining a much smarter agent with the same beliefs and goals), or base morality on this model (e.g. preference utilitarianism, CEV), is that more like using Newtonian physics to approximate relativity (works well, breaks down in edge cases) or more like cavemen using their physics intuitions to reason about space (a fundamentally flawed approach)?

Another gesture towards the thing: a popular metaphor for Kahneman and Tversky’s dual process theory is a rider trying to control an elephant. Implicit in this metaphor is the localisation of personal identity primarily in the system 2 rider. Imagine reversing that, so that the experience and behaviour you identify with are primarily driven by your system 1, with a system 2 that is mostly a Hansonian rationalisation engine on top (one which occasionally also does useful maths). Does this shift your intuitions about the ideas above, e.g. by making your CEV feel less well-defined? I claim that the latter perspective is just as sensible as the former, and perhaps even more so—see, for example, Paul Christiano’s model of the mind, which leads him to conclude that “imagining conscious deliberation as fundamental, rather than a product and input to reflexes that actually drive behavior, seems likely to cause confusion.”

These ideas have been stewing in my mind for a while, but the immediate trigger for this post was a conversation about morality which went along these lines:

R (me): Evolution gave us a jumble of intuitions, which might contradict when we extrapolate them. So it’s fine to accept that our moral preferences may contain some contradictions.

O (a friend): You can’t just accept a contradiction! It’s like saying “I have an intuition that 51 is prime, so I’ll just accept that as an axiom.”

R: Morality isn’t like maths. It’s more like having tastes in food, and then having preferences that the tastes have certain consistency properties—but if your tastes are strong enough, you might just ignore some of those preferences.

O: For me, my meta-level preferences about the ways to reason about ethics (e.g. that you shouldn’t allow contradictions) are so much stronger than my object-level preferences that this wouldn’t happen. Maybe you can ignore the fact that your preferences contain a contradiction, but if we scaled you up to be much more intelligent, running on a brain orders of magnitude larger, having such a contradiction would break your thought processes.

R: Actually, I think a much smarter agent could still be weirdly modular like humans are, and work in such a way that describing it as having “beliefs” is still a very lossy approximation. And it’s plausible that there’s no canonical way to “scale me up”.

I had a lot of difficulty in figuring out what I actually meant during that conversation, but I think a quick way to summarise the disagreement is that O is a rationality realist, and I’m not. This is not a problem, per se: I’m happy that some people are already working on AI safety from this mindset, and I can imagine becoming convinced that rationality realism is a more correct mindset than my own. But I think it’s a distinction worth keeping in mind, because assumptions baked into underlying worldviews are often difficult to notice, and also because the rationality community has selection effects favouring this particular worldview even though it doesn’t necessarily follow from the community’s founding thesis (that humans can and should be more rational).

What links here?

Richard_Ngo16 Sep 2018 10:46 UTC

LW: 192 AF: 36

147 comments4 min readLW link 3 reviews

AI Rationality Law-Thinking

Rohin Shah 21 Nov 2019 4:05 UTC
12 points
0
It’s a common crux between me and MIRI / rationalist types in AI safety, and it’s way easier to say “Realism about rationality” than to engage in an endless debate about whether everything is approximating AIXI or whatever that never seems to update me.
romeostevensit 22 Nov 2019 5:02 UTC
9 points
0
This is one of the unfortunately few times there was *substantive* philosophical discussion on the forum. This is a central example of what I think is good about LW.
DanielFilan 22 Nov 2019 5:41 UTC
7 points
0
This post gave a short name for a way of thinking that I naturally fall into, and implicitly pointing to the possibility of that way of thinking being mistaken. This makes a variety of discussions in the AI alignment space more tractable. I do wish that the post were more precise at characterising the position of ‘realism about rationality’ and its converse, or (even better) that it gave arguments for or against ‘realism about rationality’ (even a priors-based one as in this closely related Robin Hanson post), but pointing to a type of proposition and giving it a name seems very valuable.
- DanielFilan 22 Nov 2019 5:49 UTC
  4 points
  0
  Parent
  Note that the linked technical report by Salamon, Rayhawk, and Kramar does a good job at looking at evidence for and against ‘rationality realism’, or as they call it, ‘the intelligibility of intelligence’.
- DanielFilan 22 Nov 2019 5:43 UTC
  4 points
  0
  Parent
  I do think that it was an interesting choice for the post to be about ‘realism about rationality’ rather than its converse, which the author seems to subscribe to. This probably can be chalked up to it being easier to clearly see a thinking pattern that you don’t frequently use, I guess?
  - Richard_Ngo 22 Nov 2019 12:02 UTC
    5 points
    0
    Parent
    I think in general, if there’s a belief system B that some people have, then it’s much easier and more useful to describe B than ~B. It’s pretty clear if, say, B = Christianity, or B = Newtonian physics. I think of rationality anti-realism less as a specific hypothesis about intelligence, and more as a default skepticism: why should intelligence be formalisable? Most things aren’t!
    (I agree that if you think most things are formalisable, so that realism about rationality should be our default hypothesis, then phrasing it this way around might seem a little weird. But the version of realism about rationality that people buy into around here also depends on some of the formalisms that we’ve actually come up with being useful, which is a much more specific hypothesis, making skepticism again the default position.)
    - DanielFilan 22 Nov 2019 18:26 UTC
      2 points
      0
      Parent
      I think that rationality realism is to Bayesianism is to rationality anti-realism as theism is to Christianity is to atheism. Just like it’s feasible and natural to write a post advocating and mainly talking about atheism, despite that position being based on default skepticism and in some sense defined by theism, I think it would be feasible and natural to write a post titled ‘rationality anti-realism’ that focussed on that proposition and described why it was true.
[ ]
[deleted]

abramdemski 10 Jan 2020 5:06 UTC
LW: 71 AF: 24
0
AF
I didn’t like this post. At the time, I didn’t engage with it very much. I wrote a mildly critical comment (which is currently the top-voted comment, somewhat to my surprise) but I didn’t actually engage with the idea very much. So it seems like a good idea to say something now.
The main argument that this is valuable seems to be: this captures a common crux in AI safety. I don’t think it’s my crux, and I think other people who think it is their crux are probably mistaken. So from my perspective it’s a straw-man of the view it’s trying to point at.
The main problem is the word “realism”. It isn’t clear exactly what it means, but I suspect that being really anti-realist about rationality would not shift my views about the importance of MIRI-style research that much.
I agree that there’s something kind of like rationality realism. I just don’t think this post successfully points at it.
Ricraz starts out with the list: momentum, evolutionary fitness, intelligence. He says that the question (of rationality realism) is whether fitness is more like momentum or more like fitness. Momentum is highly formalizable. Fitness is a useful abstraction, but no one can write down the fitness function for a given organism. If pressed, we have to admit that it does not exist: every individual organism has what amounts to its own different environment, since it has different starting conditions (nearer to different food sources, etc), and so, is selected on different criteria.
So as I understand it, the claim is that the MIRI cluster believes rationality is more like momentum, but many outside the MIRI cluster believe it’s more like fitness.
It seems to me like my position, and the MIRI-cluster position, is (1) closer to “rationality is like fitness” than “rationality is like momentum”, and (2) doesn’t depend that much on the difference. Realism about rationality is important to the theory of rationality (we should know what kind of theoretical object rationality is), but not so important for the question of whether we need to know about rationality. (This also seems supported by the analogy—evolutionary biologists still see fitness as a very important subject, and don’t seem to care that much about exactly how real the abstraction is.)
To the extent that this post has made a lot of people think that rationality realism is an important crux, it’s quite plausible to me that it’s made the discussion worse.
To expand more on (1) -- since it seems a lot of people found its negation plausible—it seems like if there’s an analogue for the theory of evolution, which uses relatively unreal concepts like “fitness” to help us understand rational agency, we’d like to know about it. In this view, MIRI-cluster is essentially saying “biologists should want to invent evolution. Look at all the similarities across different animals. Don’t you want to explain that?” Whereas the non-MIRI cluster is saying “biologists don’t need to know about evolution.”
What links here?
- Prizes for Last Year’s 2018 Review by Raemon (2 Dec 2020 11:21 UTC; 72 points)
- DanielFilan's comment on Realism about rationality by Richard_Ngo (18 Jan 2020 4:21 UTC; 9 points)
- Rohin Shah 12 Jan 2020 19:14 UTC
  LW: 13 AF: 5
  2
  AF Parent
  ETA: The original version of this comment conflated “evolution” and “reproductive fitness”, I’ve updated it now (see also my reply to Ben Pace’s comment).
  
  Realism about rationality is important to the theory of rationality (we should know what kind of theoretical object rationality is), but not so important for the question of whether we need to know about rationality.
  
  MIRI in general and you in particular seem unusually (to me) confident that:
  
  1. We can learn more than we already know about rationality of “ideal” agents (or perhaps arbitrary agents?).
  
  2. This understanding will allow us to build AI systems that we understand better than the ones we build today.
  
  3. We will be able to do this in time for it to affect real AI systems. (This could be either because it is unusually tractable and can be solved very quickly, or because timelines are very long.)
  
  This is primarily based on what research you and MIRI do, some of MIRI’s strategy writing, writing like the Rocket Alignment problem and law thinking, and an assumption that you are choosing to do this research because you think it is an effective way to reduce AI risk (given your skills).
  
  (Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)
  
  My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy). I’d be interested in an argument for the three points listed above without realism about rationality (I agree with 1, somewhat agree with 2, and don’t agree with 3).
  
  If you don’t have realism about rationality, then I basically agree with this sentence, though I’d rephrase it:
  
  MIRI-cluster is essentially saying “biologists should want to invent evolution. Look at all the similarities across different animals. Don’t you want to explain that?” Whereas the non-MIRI cluster is saying “biologists don’t need to know about evolution.”
  
  (ETA: In my head I was replacing “evolution” with “reproductive fitness”; I don’t agree with the sentence as phrased, I would agree with it if you talked only about understanding reproductive fitness, rather than also including e.g. the theory of natural selection, genetics, etc. In the rest of your comment you were talking about reproductive fitness, I don’t know why you suddenly switched to evolution; it seems completely different from everything you were talking about before.)
  
  To my knowledge, ~~the theory of evolution~~ (ETA: mathematical understanding of reproductive fitness) has not had nearly the same impact on our ability to make big things as (say) any theory of physics. The Rocket Alignment Problem explicitly makes an analogy to an invention that required a theory of gravitation / momentum etc. Even physics theories that talk about extreme situations can enable applications; e.g. GPS would not work without an understanding of relativity. In contrast, I struggle to name a way that ~~evolution~~ (ETA: insights based on reproductive fitness) affects an everyday person (ignoring irrelevant things like atheism-religion debates). There are lots of applications based on an understanding of DNA, but DNA is a “real” thing. (This would make me sympathetic to a claim that rationality research would give us useful intuitions that lead us to discover “real” things that would then be important, but I don’t think that’s the claim.) My underlying model is that when you talk about something so “real” that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can’t do this with “non-real” things.
  
  So I’d rephrase the sentence as: (ETA: changed the sentence a bit to talk about fitness instead of evolution)
  
  MIRI-cluster is essentially saying “biologists should want to understand reproductive fitness. Look at all the similarities across different animals. Don’t you want to explain that?” Whereas the non-MIRI cluster is saying “Yeah, it’s a fascinating question to understand what makes animals fit, but given that we want to understand how antidepressants work, it is a better strategy to directly study what happens when an animal takes an antidepressant.”
  
  Which you could round off to “biologists don’t need to know about reproductive fitness”, in the sense that it is not the best use of their time.
  
  ETA: I also have a model of you being less convinced by realism about rationality than others in the “MIRI crowd”; in particular, selection vs. control seems decidedly less “realist” than mesa-optimizers (which didn’t have to be “realist”, but was quite “realist” the way it was written, especially in its focus on search).
  What links here?
  - Rohin Shah's comment on Realism about rationality by Richard_Ngo (12 Jan 2020 20:45 UTC; 4 points)
  - Ben Pace 13 Jan 2020 0:45 UTC
    LW: 16 AF: 5
    0
    AF Parent
    Huh? A lot of these points about evolution register to me as straightforwardly false. Understanding the theory of evolution moved us from “Why are there all these weird living things? Why do they exist? What is going on?” to “Each part of these organisms has been designed by a local hill-climbing process to maximise reproduction.” If I looked into it, I expect I’d find out that early medicine found it very helpful to understand how the system was built. This is like me handing you a massive amount of code that has a bunch of weird outputs and telling you to make it work better and more efficiently, and the same thing but where I tell you what company made the code, why they made it, and how they made it, and loads of examples of other pieces of code they made in this fashion.
    If I knew how to operationalise it I would take a pretty strong bet that the theory of natural selection has been revolutionary in the history of medicine.
    - Rohin Shah 13 Jan 2020 2:29 UTC
      LW: 3 AF: 2
      0
      AF Parent
      A lot of these points about evolution register to me as straightforwardly false.
      I don’t know which particular points you mean. The only one that it sounds like you’re arguing against is
      he theory of evolution has not had nearly the same impact on our ability to make big things [...] I struggle to name a way that evolution affects an everyday person
      Were there others?
      I would take a pretty strong bet that the theory of natural selection has been revolutionary in the history of medicine.
      I think the mathematical theory of natural selection + the theory of DNA / genes were probably very influential in both medicine and biology, because they make very precise predictions and the real world is a very good fit for the models they propose. (That is, they are “real”, in the sense that “real” is meant in the OP.) I don’t think that an improved mathematical understanding of what makes particular animals more fit has had that much of an impact on anything.
      Separately, I also think the general insight of “each part of these organisms has been designed by a local hill-climbing process to maximise reproduction” would not have been very influential in either medicine or biology, had it not been accompanied by the math (and assuming no one ever developed the math).
      On reflection, my original comment was quite unclear about this, I’ll add a note to it to clarify.
      I do still stand by the thing that I meant in my original comment, which is that to the extent that you think rationality is like reproductive fitness (the claim made in the OP that Abram seems to agree with), where it is a very complicated mess of a function that we don’t hope to capture in a simple equation; I don’t think that improved understanding of that sort of thing has made much of an impact on our ability to do “big things” (as a proxy, things that affect normal people).
      Within evolution, the claim would be that there has not been much impact from gaining an improved mathematical understanding of the reproductive fitness of some organism, or the “reproductive fitness” of some meme for memetic evolution.
      - DanielFilan 13 Jan 2020 2:49 UTC
        LW: 2 AF: 1
        0
        AF Parent
        
        I think the mathematical theory of natural selection + the theory of DNA / genes were probably very influential in both medicine and biology, because they make very precise predictions and the real world is a very good fit for the models they propose. (That is, they are “real”, in the sense that “real” is meant in the OP.)
        
        In contrast, I think the general insight of “each part of these organisms has been designed by a local hill-climbing process to maximise reproduction” would not have been very influential in either medicine or biology, had it not been accompanied by the math.
        
        But surely you wouldn’t get the mathematics of natural selection without the general insight, and so I think the general insight deserves to get a bunch of the credit. And both the mathematics of natural selection and the general insight seem pretty tied up to the notion of ‘reproductive fitness’.
        
        Rohin Shah 13 Jan 2020 3:07 UTC
        LW: 1 AF: 2
        0
        AF Parent
        But surely you wouldn’t get the mathematics of natural selection without the general insight, and so I think the general insight deserves to get a bunch of the credit. And both the mathematics of natural selection and the general insight seem pretty tied up to the notion of ‘reproductive fitness’.
        Here is my understanding of what Abram thinks:
        Rationality is like “reproductive fitness”, in that it is hard to formalize and turn into hard math. Regardless of how much theoretical progress we make on understanding rationality, it is never going to turn into something that can make very precise, accurate predictions about real systems. Nonetheless, qualitative understanding of rationality, of the sort that can make rough predictions about real systems, is useful for AI safety.
        Hopefully that makes it clear why I’m trying to imagine a counterfactual where the math was never developed.
        It’s possible that I’m misunderstanding Abram and he actually thinks that we will be able to make precise, accurate predictions about real systems; but if that’s the case I think he in fact is “realist about rationality” and this post is in fact pointing at a crux between him and Richard (or him and me), though not as well as he would like.
  - abramdemski 17 Jan 2020 21:43 UTC
    LW: 11 AF: 4
    0
    AF Parent
    (Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)
    This seems like the closest fit, but my view has some commonalities with points 1-3 nonetheless.
    (I agree with 1, somewhat agree with 2, and don’t agree with 3).
    It sounds like our potential cruxes are closer to point 3 and to the question of how doomed current approaches are. Given that, do you still think rationality realism seems super relevant (to your attempted steelman of my view)?
    My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy).
    I guess my position is something like this. I think it may be quite possible to make capabilities “blindly”—basically the processing-power heavy type of AI progress (applying enough tricks so you’re not literally recapitulating evolution, but you’re sorta in that direction on a spectrum). Or possibly that approach will hit a wall at some point. But in either case, better understanding would be essentially necessary for aligning systems with high confidence. But that same knowledge could potentially accelerate capabilities progress.
    So I believe in some kind of knowledge to be had (ie, point #1).
    Yeah, so, taking stock of the discussion again, it seems like:
    There’s a thing-I-believe-which-is-kind-of-like-rationality-realism.
    Points 1 and 2 together seem more in line with that thing than “rationality realism” as I understood it from the OP.
    You already believe #1, and somewhat believe #2.
    We are both pessimistic about #3, but I’m so pessimistic about doing things without #3 that I work under the assumption anyway (plus I think my comparative advantage is contributing to those worlds).
    We probably do have some disagreement about something like “how real is rationality?”—but I continue to strongly suspect it isn’t that cruxy.
    (ETA: In my head I was replacing “evolution” with “reproductive fitness”; I don’t agree with the sentence as phrased, I would agree with it if you talked only about understanding reproductive fitness, rather than also including e.g. the theory of natural selection, genetics, etc. In the rest of your comment you were talking about reproductive fitness, I don’t know why you suddenly switched to evolution; it seems completely different from everything you were talking about before.)
    I checked whether I thought the analogy was right with “reproductive fitness” and decided that evolution was a better analogy for this specific point. In claiming that rationality is as real as reproductive fitness, I’m claiming that there’s a theory of evolution out there.
    Sorry it resulted in a confusing mixed metaphor overall.
    But, separately, I don’t get how you’re seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they’re separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution—without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.
    To my knowledge, the theory of evolution (ETA: mathematical understanding of reproductive fitness) has not had nearly the same impact on our ability to make big things as (say) any theory of physics. The Rocket Alignment Problem explicitly makes an analogy to an invention that required a theory of gravitation / momentum etc. Even physics theories that talk about extreme situations can enable applications; e.g. GPS would not work without an understanding of relativity. In contrast, I struggle to name a way that evolution(ETA: insights based on reproductive fitness) affects an everyday person (ignoring irrelevant things like atheism-religion debates). There are lots of applications based on an understanding of DNA, but DNA is a “real” thing. (This would make me sympathetic to a claim that rationality research would give us useful intuitions that lead us to discover “real” things that would then be important, but I don’t think that’s the claim.)
    I think this is due more to stuff like the relevant timescale than the degree of real-ness. I agree real-ness is relevant, but it seems to me that the rest of biology is roughly as real as reproductive fitness (ie, it’s all very messy compared to physics) but has far more practical consequences (thinking of medicine). On the other side, astronomy is very real but has few industry applications. There are other aspects to point at, but one relevant factor is that evolution and astronomy study things on long timescales.
    Reproductive fitness would become very relevant if we were sending out seed ships to terraform nearby planets over geological time periods, in the hope that our descendants might one day benefit. (Because we would be in for some surprises if we didn’t understand how organisms seeded on those planets would likely evolve.)
    So—it seems to me—the question should not be whether an abstract theory of rationality is the sort of thing which on-outside-view has few or many economic consequences, but whether it seems like the sort of thing that applies to building intelligent machines in particular!
    My underlying model is that when you talk about something so “real” that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can’t do this with “non-real” things.
    Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.
    As for reaching high confidence, yeah, there needs to be a different model of how you reach high confidence.
    The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don’t usually need exact models of attackers, and a system which relies on those is less likely to be secure.
    - Rohin Shah 18 Jan 2020 1:05 UTC
      LW: 5 AF: 5
      0
      AF Parent
      I think we disagree primarily on 2 (and also how doomy the default case is, but let’s set that aside).
      In claiming that rationality is as real as reproductive fitness, I’m claiming that there’s a theory of evolution out there.
      I think that’s a crux between you and me. I’m no longer sure if it’s a crux between you and Richard. (ETA: I shouldn’t call this a crux, I wouldn’t change my mind on whether MIRI work is on-the-margin more valuable if I changed my mind on this, but it would be a pretty significant update.)
      Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.
      Yeah, I was ignoring that sort of stuff. I do think this post would be better without the evolutionary fitness example because of this confusion. I was imagining the “unreal rationality” world to be similar to what Daniel mentions below:
      I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.
      
      But, separately, I don’t get how you’re seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they’re separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution—without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.
      Yeah, I’m going to try to give a different explanation that doesn’t involve “realness”.
      When groups of humans try to build complicated stuff, they tend to do so using abstraction. The most complicated stuff is built on a tower of many abstractions, each sitting on top of lower-level abstractions. This is most evident (to me) in software development, where the abstraction hierarchy is staggeringly large, but it applies elsewhere, too: the low-level abstractions of mechanical engineering are “levers”, “gears”, “nails”, etc.
      A pretty key requirement for abstractions to work is that they need to be as non-leaky as possible, so that you do not have to think about them as much. When I code in Python and I write “x + y”, I can assume that the result will be the sum of the two values, and this is basically always right. Notably, I don’t have to think about the machine code that deals with the fact that overflow might happen. When I write in C, I do have to think about overflow, but I don’t have to think about how to implement addition at the bitwise level. This becomes even more important at the group level, because communication is expensive, slow, and low-bandwidth relative to thought, and so you need non-leaky abstractions so that you don’t need to communicate all the caveats and intuitions that would accompany a leaky abstraction.
      One way to operationalize this is that to be built on, an abstraction must give extremely precise (and accurate) predictions.
      It’s fine if there’s some complicated input to the abstraction, as long as that input can be estimated well in practice. This is what I imagine is going on with evolution and reproductive fitness—if you can estimate reproductive fitness, then you can get very precise and accurate predictions, as with e.g. the Price equation that Daniel mentioned. (And you can estimate fitness, either by using things like the Price equation + real data, or by controlling the environment where you set up the conditions that make something reproductively fit.)
      If a thing cannot provide extremely precise and accurate predictions, then I claim that humans mostly can’t build on top of it. We can use it to make intuitive arguments about things very directly related to it, but can’t generalize it to something more far-off. Some examples from these comment threads of what “inferences about directly related things” looks like:
      current theories about why England had an industrial revolution when it did
      [biology] has far more practical consequences (thinking of medicine)
      understanding why overuse of antibiotics might weaken the effect of antibiotics [based on knowledge of evolution]
      Note that in all of these examples, you can more or less explain the conclusion in terms of the thing it depends on. E.g. You can say “overuse of antibiotics might weaken the effect of antibiotics because the bacteria will evolve / be selected to be resistant to the antibiotic”.
      In contrast, for abstractions like “logic gates”, “assembly language”, “levers”, etc, we have built things like rockets and search engines that certainly could not have been built without those abstractions, but nonetheless you’d be hard pressed to explain e.g. how a search engine works if you were only allowed to talk with abstractions at the level of logic gates. This is because the precision afforded by those abstractions allows us to build huge hierarchies of better abstractions.
      So now I’d go back and state our crux as:
      Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?
      I would guess not. It sounds like you would guess yes.
      I think this is upstream of 2. When I say I somewhat agree with 2, I mean that you can probably get a theory of rationality that makes imprecise predictions, which allows you to say things about “directly relevant things”, which will probably let you say some interesting things about AI systems, just not very much. I’d expect that, to really affect ML systems, given how far away from regular ML research MIRI research is, you would need a theory that’s precise enough to build hierarchies with.
      (I think I’d also expect that you need to directly use the results of the research to build an AI system, rather than using it to inform existing efforts to build AI.)
      (You might wonder why I’m optimistic about conceptual ML safety work, which is also not precise enough to build hierarchies of abstraction. The basic reason is that ML safety is “directly relevant” to existing ML systems, and so you don’t need to build hierarchies of abstraction—just the first imprecise layer is plausibly enough. You can see this in the fact that there are already imprecise concepts that are directly talking about safety.)
      The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don’t usually need exact models of attackers, and a system which relies on those is less likely to be secure.
      Your few assumptions need to talk about the system you actually build. On the model I’m outlining, it’s hard to state the assumptions for the system you actually build, and near-impossible to be very confident in those assumptions, because they are (at least) one level of hierarchy higher than the (assumed imprecise) theory of rationality.
      What links here?
      Plausible cases for HRAD work, and locating the crux in the “realism about rationality” debate by riceissa (22 Jun 2020 1:10 UTC; 85 points)
      What are some exercises for building/generating intuitions about key disagreements in AI alignment? by riceissa (16 Mar 2020 7:41 UTC; 18 points)
      EdoArad's comment on Some thoughts on deference and inside-view models by Buck (EA Forum; 6 Jun 2020 13:34 UTC; 9 points)
      Rohin Shah's comment on Clarifying “AI Alignment” by paulfchristiano (19 Jan 2020 0:29 UTC; 5 points)
      Rohin Shah's comment on Realism about rationality by Richard_Ngo (18 Jan 2020 6:47 UTC; 2 points)
      - abramdemski 18 Jan 2020 19:00 UTC
        LW: 4 AF: 3
        0
        AF Parent
        I generally like the re-framing here, and agree with the proposed crux.
        I may try to reply more at the object level later.
        edoarad 6 Jun 2020 12:55 UTC
        2 points
        0
        Parent
        Abram, did you reply to that crux somewhere?
  - abramdemski 13 Jan 2020 12:57 UTC
    LW: 9 AF: 6
    0
    AF Parent
    ETA: I also have a model of you being less convinced by realism about rationality than others in the “MIRI crowd”; in particular, selection vs. control seems decidedly less “realist” than mesa-optimizers (which didn’t have to be “realist”, but was quite “realist” the way it was written, especially in its focus on search).
    Just a quick reply to this part for now (but thanks for the extensive comment, I’ll try to get to it at some point).
    It makes sense. My recent series on myopia also fits this theme. But I don’t get much* push-back on these things. Some others seem even less realist than I am. I see myself as trying to carefully deconstruct my notions of “agency” into component parts that are less fake. I guess I do feel confused why other people seem less interested in directly deconstructing agency the way I am. I feel somewhat like others kind of nod along to distinctions like selection vs control but then go back to using a unitary notion of “optimization”. (This applies to people at MIRI and also people outside MIRI.)
    *The one person who has given me push-back is Scott.
  - DanielFilan 18 Jan 2020 4:11 UTC
    LW: 7 AF: 3
    0
    AF Parent
    
    My underlying model is that when you talk about something so “real” that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can’t do this with “non-real” things.
    
    For what it’s worth, I think I disagree with this even when “non-real” means “as real as the theory of liberalism”. One example is companies—my understanding is that people have fake theories about how companies should be arranged, that these theories can be better or worse (and evaluated as so without looking at how their implementations turn out), that one can maybe learn these theories in business school, and that implementing them creates more valuable companies (at least in expectation). At the very least, my understanding is that providing management advice to companies in developing countries significantly raises their productivity, and found this study to support this half-baked memory.
    
    (next paragraph is super political, but it’s important to my point)
    
    I live in what I honestly, straightforwardly believe is the greatest country in the world (where greatness doesn’t exactly mean ‘moral goodness’ but does imply the ability to support moral goodness—think some combination of wealth and geo-strategic dominance), whose government was founded after a long series of discussions about how best to use the state to secure individual liberty. If I think about other wealthy countries, it seems to me that ones whose governments built upon this tradition of the interaction between liberty and governance are over-represented (e.g. Switzerland, Singapore, Hong Kong). The theory of liberalism wasn’t complete or real enough to build a perfect government, or even a government reliable enough to keep to its founding principles (see complaints American constitutionalists have about how things are done today), but it was something that can be built upon.
    
    At any rate, I think it’s the case that the things that can be built off of these fake theories aren’t reliable enough to satisfy a strict Yudkowsky-style security mindset. But I do think it’s possible to productively build off of them.
    - Rohin Shah 18 Jan 2020 6:47 UTC
      LW: 2 AF: 2
      0
      AF Parent
      On the model proposed in this comment, I think of these as examples of using things / abstractions / theories with imprecise predictions to reason about things that are “directly relevant”.
      If I agreed with the political example (and while I wouldn’t say that myself, it’s within the realm of plausibility), I’d consider that a particularly impressive version of this.
      - DanielFilan 18 Jan 2020 7:09 UTC
        LW: 2 AF: 1
        0
        AF Parent
        I’m confused how my examples don’t count as ‘building on’ the relevant theories—it sure seems like people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning, and if that’s true (and if the things in the real world actually successfully fulfilled their purpose), then I’d think that spending time and effort developing the relevant theories was worth it. This argument has some weak points (the US government is not highly reliable at preserving liberty, very few individual businesses are highly reliable at delivering their products, the theories of management and liberalism were informed by a lot of experimentation), but you seem to be pointing at something else.
        
        Rohin Shah 18 Jan 2020 20:54 UTC
        LW: 6 AF: 4
        0
        AF Parent
        people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning
        Agreed. I’d say they built things in the real world that were “one level above” their theories.
        if that’s true, [...] then I’d think that spending time and effort developing the relevant theories was worth it
        Agreed.
        you seem to be pointing at something else
        Agreed.
        Overall I think these relatively-imprecise theories let you build things “one level above”, which I think your examples fit into. My claim is that it’s very hard to use them to build things “2+ levels above”.
        Separately, I claim that:
        “real AGI systems” are “2+ levels above” the sorts of theories that MIRI works on.
        MIRI’s theories will always be the relatively-imprecise theories that can’t scale to “2+ levels above”.
        (All of this with weak confidence.)
        I think you disagree with the underlying model, but assuming you granted that, you would disagree with the second claim; I don’t know what you’d think of the first.
        What links here?
        Plausible cases for HRAD work, and locating the crux in the “realism about rationality” debate by riceissa (22 Jun 2020 1:10 UTC; 85 points)
        DanielFilan 18 Jan 2020 22:48 UTC
        LW: 4 AF: 3
        0
        AF Parent
        OK, I think I understand you now.
        
        Overall I think these relatively-imprecise theories let you build things “one level above”, which I think your examples fit into. My claim is that it’s very hard to use them to build things “2+ levels above”.
        
        I think that I sort of agree if ‘levels above’ means levels of abstraction, where one system uses an abstraction of another and requires the mesa-system to satisfy some properties. In this case, the more layers of abstraction you have, the more requirements you’re demanding which can independently break, which exponentially reduces the chance that you’ll have no failure.
        
        But also, to the extent that your theory is mathematisable and comes with ‘error bars’, you have a shot at coming up with a theory of abstractions that is robust to failure of your base-level theory. So some transistors on my computer can fail, evidencing the imprecision of the simple theory of logic gates, but my computer can still work fine because the abstractions on top of logic gates accounted for some amount of failure of logic gates. Similarly, even if you have some uncorrelated failures of individual economic rationality, you can still potentially have a pretty good model of a market. I’d say that the lesson is that the more levels of abstraction you have to go up, the more difficult it is to make each level robust to failures of the previous level, and as such the more you’d prefer the initial levels be ‘exact’.
        
        “real AGI systems” are “2+ levels above” the sorts of theories that MIRI works on.
        
        I’d say that they’re some number of levels above (of abstraction) and also levels below (of implementation). So for an unrealistic example, if you develop logical induction decision theory, you have your theory of logical induction, then you depend on that theory to have your decision theory (first level of abstraction), and then you depend on your decision theory to have multiple LIDT agents behave well together (second level of abstraction). Separately, you need to actually implement your logical inductor by some machine learning algorithm (first level of implementation), which is going to depend on numpy and floating point arithmetic and such (second and third (?) levels of implementation), which depends on computing hardware and firmware (I don’t know how many levels of implementation that is).
        
        When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised). They are also mathematised enough that I’m optimistic about upwards abstraction having the possibility of robustness. There are some exceptions (e.g. the mesa-optimisers paper), but they seem like they’re on the path to greater mathematisability.
        
        MIRI’s theories will always be the relatively-imprecise theories that can’t scale to “2+ levels above”
        
        I’m not sure about this, but I disagree with the version that replaces ‘MIRI’s theories’ with ‘mathematical theories of embedded rationality’, basically for the reasons that Vanessa discusses.
        
        Rohin Shah 18 Jan 2020 23:53 UTC
        LW: 2 AF: 2
        0
        AF Parent
        I disagree with the version that replaces ‘MIRI’s theories’ with ‘mathematical theories of embedded rationality’
        Yeah, I think this is the sense in which realism about rationality is an important disagreement.
        But also, to the extent that your theory is mathematisable and comes with ‘error bars’
        Yeah, I agree that this would make it easier to build multiple levels of abstractions “on top”. I also would be surprised if mathematical theories of embedded rationality came with tight error bounds (where “tight” means “not so wide as to be useless”). For example, current theories of generalization in deep learning do not provide tight error bounds to my knowledge, except in special cases that don’t apply to the main successes of deep learning.
        When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. [...] They are also mathematised enough that I’m optimistic about upwards abstraction having the possibility of robustness.
        Agreed.
        The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised).
        I am basically only concerned about machine learning, when I say that you can’t build on the theories. My understanding of MIRI’s mainline story of impact is that they develop some theory that AI researchers use to change the way they do machine learning that leads to safe AI. This sounds to me like there are multiple levels of inference: “MIRI’s theory” → “machine learning” → “AGI”. This isn’t exactly layers of abstraction, but I think the same principle applies, and this seems like too many layers.
        You could imagine other stories of impact, and I’d have other questions about those, e.g. if the story was “MIRI’s theory will tell us how to build aligned AGI without machine learning”, I’d be asking when the theory was going to include computational complexity.
        What links here?
        Plausible cases for HRAD work, and locating the crux in the “realism about rationality” debate by riceissa (22 Jun 2020 1:10 UTC; 85 points)
  - DanielFilan 13 Jan 2020 0:25 UTC
    LW: 4 AF: 2
    0
    AF Parent
    In contrast, I struggle to name a way that evolution affects an everyday person
    
    I’m not sure what exactly you mean, but examples that come to mind:
    
    Crops and domestic animals that have been artificially selected for various qualities.
    The medical community encouraging people to not use antibiotics unnecessarily.
    [Inheritance but not selection] The fact that your kids will probably turn out like you without specific intervention on your part to make that happen.
    - Rohin Shah 13 Jan 2020 2:09 UTC
      LW: 4 AF: 3
      0
      AF Parent
      Crops and domestic animals that have been artificially selected for various qualities.
      I feel fairly confident this was done before we understood evolution.
      The fact that your kids will probably turn out like you without specific intervention on your part to make that happen.
      Also seems like a thing we knew before we understood evolution.
      The medical community encouraging people to not use antibiotics unnecessarily.
      That one seems plausible; though I’d want to know more about the history of how this came up. It also seems like the sort of thing that we’d have figured out even if we didn’t understand evolution, though it would have taken longer, and would have involved more deaths.
      Going back to the AI case, my takeaway from this example is that understanding non-real things can still help if you need to get everything right the first time. And in fact, I do think that if you posit a discontinuity, such that we have to get everything right before that discontinuity, then the non-MIRI strategy looks worse because you can’t gather as much empirical evidence (though I still wouldn’t be convinced that the MIRI strategy is the right one).
      - DanielFilan 13 Jan 2020 2:30 UTC
        LW: 2 AF: 1
        0
        AF Parent
        Ah, I didn’t quite realise you meant to talk about “human understanding of the theory of evolution” rather than evolution itself. I still suspect that the theory of evolution is so fundamental to our understanding of biology, and our understanding of biology so useful to humanity, that if human understanding of evolution doesn’t contribute much to human welfare it’s just because most applications deal with pretty long time-scales.
        
        (Also I don’t get why this discussion is treating evolution as ‘non-real’: stuff like the Price equation seems pretty formal to me. To me it seems like a pretty mathematisable theory with some hard-to-specify inputs like fitness.)
        
        Rohin Shah 13 Jan 2020 3:02 UTC
        LW: 4 AF: 3
        0
        AF Parent
        (Also I don’t get why this discussion is treating evolution as ‘non-real’: stuff like the Price equation seems pretty formal to me. To me it seems like a pretty mathematisable theory with some hard-to-specify inputs like fitness.)
        Yeah, I agree, see my edits to the original comment and also my reply to Ben. Abram’s comment was talking about reproductive fitness the entire time and then suddenly switched to evolution at the end; I didn’t notice this and kept thinking of evolution as reproductive fitness in my head, and then wrote a comment based on that where I used the word evolution despite thinking about reproductive fitness and the general idea of “there is a local hill-climbing search on reproductive fitness” while ignoring the hard math.
  - Raemon 12 Jan 2020 21:06 UTC
    LW: 4 AF: 2
    0
    AF Parent
    Which you could round off to “biologists don’t need to know about evolution”, in the sense that it is not the best use of their time.
    The most obvious thing is understanding why overuse of antibiotics might weaken the effect of antibiotics.
    - Rohin Shah 13 Jan 2020 2:10 UTC
      LW: 2 AF: 2
      0
      AF Parent
      See response to Daniel below; I find this one a little compelling (but not that much).
  - Zack_M_Davis 12 Jan 2020 19:31 UTC
    4 points
    0
    Parent
    
    I struggle to name a way that evolution affects an everyday person (ignoring irrelevant things like atheism-religion debates).
    
    Evolutionary psychology?
    - Matthew Barnett 12 Jan 2020 21:05 UTC
      2 points
      0
      Parent
      How does evolutionary psychology help us during our everyday life? We already know that people like having sex and that they execute all these sorts of weird social behaviors. Why does providing the ultimate explanation for our behavior provide more than a satisfaction of our curiosity?
      - Rohin Shah 13 Jan 2020 1:58 UTC
        2 points
        0
        Parent
        +1, it seems like some people with direct knowledge of evolutionary psychology get something out of it, but not everyone else.
        DanielFilan 13 Jan 2020 2:10 UTC
        2 points
        0
        Parent
        Sorry, how is this not saying “people who don’t know evo-psych don’t get anything out of knowing evo-psych”?
- Richard_Ngo 14 Jan 2020 10:40 UTC
  LW: 12 AF: 6
  0
  AF Parent
  I like this review and think it was very helpful in understanding your (Abram’s) perspective, as well as highlighting some flaws in the original post, and ways that I’d been unclear in communicating my intuitions. In the rest of my comment I’ll try write a synthesis of my intentions for the original post with your comments; I’d be interested in the extent to which you agree or disagree.
  We can distinguish between two ways to understand a concept X. For lack of better terminology, I’ll call them “understanding how X functions” and “understanding the nature of X”. I conflated these in the original post in a confusing way.
  For example, I’d say that studying how fitness functions would involve looking into the ways in which different components are important for the fitness of existing organisms (e.g. internal organs; circulatory systems; etc). Sometimes you can generalise that knowledge to organisms that don’t yet exist, or even prove things about those components (e.g. there’s probably useful maths connecting graph theory with optimal nerve wiring), but it’s still very grounded in concrete examples. If we thought that we should study how intelligence functions in a similar way as we study how fitness functions, that might look like a combination of cognitive science and machine learning.
  By comparison, understanding the nature of X involves performing a conceptual reduction on X by coming up with a theory which is capable of describing X in a more precise or complete way. The pre-theoretic concept of fitness (if it even existed) might have been something like “the number and quality of an organism’s offspring”. Whereas the evolutionary notion of fitness is much more specific, and uses maths to link fitness with other concepts like allele frequency.
  Momentum isn’t really a good example to illustrate this distinction, so perhaps we could use another concept from physics, like electricity. We can understand how electricity functions in a lawlike way by understanding the relationship between voltage, resistance and current in a circuit, and so on, even when we don’t know what electricity is. If we thought that we should study how intelligence functions in a similar way as the discoverers of electricity studied how it functions, that might involve doing theoretical RL research. But we also want to understand the nature of electricity (which turns out to be the flow of electrons). Using that knowledge, we can extend our theory of how electricity functions to cases which seem puzzling when we think in terms of voltage, current and resistance in circuits (even if we spend almost all our time still thinking in those terms in practice). This illustrates a more general point: you can understand a lot about how something functions without having a reductionist account of its nature—but not everything. And so in the long term, to understand really well how something functions, you need to understand its nature. (Perhaps understanding how CS algorithms work in practice, versus understanding the conceptual reduction of algorithms to Turing Machines, is another useful example).
  I had previously thought that MIRI was trying to understand how intelligence functions. What I take from your review is that MIRI is first trying to understand the nature of intelligence. From this perspective, your earlier objection makes much more sense.
  However, I still think that there are different ways you might go about understanding the nature of intelligence, and that “something kind of like rationality realism” might be a crux here (as you mention). One way that you might try to understand the nature of intelligence is by doing mathematical analysis of what happens in the limit of increasing intelligence. I interpret work on AIXI, logical inductors, and decision theory as falling into this category. This type of work feels analogous to some of Einstein’s thought experiments about the limit of increasing speed. Would it have worked for discovering evolution? That is, would starting with a pre-theoretic concept of fitness and doing mathematical analysis of its limiting cases (e.g. by thinking about organisms that lived for arbitrarily long, or had arbitrarily large numbers of children) have helped people come up with evolution? I’m not sure. There’s an argument that Malthus did something like this, by looking at long-term population dynamics. But you could also argue that the key insights leading up to the discovery evolution were primarily inspired by specific observations about the organisms around us. And in fact, even knowing evolutionary theory, I don’t think that the extreme cases of fitness even make sense. So I would say that I am not a realist about “perfect fitness”, even though the concept of fitness itself seems fine.
  So an attempted rephrasing of the point I was originally trying to make, given this new terminology, is something like “if we succeed in finding a theory that tells us the nature of intelligence, it still won’t make much sense in the limit, which is the place where MIRI seems to be primarily studying it (with some exceptions, e.g. your Partial Agency sequence). Instead, the best way to get that theory is to study how intelligence functions.”
  The reason I called it “rationality realism” not “intelligence realism” is that rationality has connotations of this limit or ideal existing, whereas intelligence doesn’t. You might say that X is very intelligent, and Y is more intelligent than X, without agreeing that perfect intelligence exists. Whereas when we talk about rationality, there’s usually an assumption that “perfect rationality” exists. I’m not trying to argue that concepts which we can’t formalise “aren’t real”, but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can’t formalise, and that it’s those incoherent extrapolations like “perfect fitness” which “aren’t real” (I agree that this was quite unclear in the original post).
  My proposed redefinition:
  - The “intelligence is intelligible” hypothesis is about how lawlike the best description of how intelligence functions will turn out to be.
  - The “realism about rationality” hypothesis is about how well-defined intelligence is in the limit (where I think of the limit of intelligence as “perfect rationality”, and “well-defined” with respect not to our current understanding, but rather with respect to the best understanding of the nature of intelligence we’ll ever discover).
  What links here?
  - abramdemski 19 Jan 2020 20:23 UTC
    LW: 8 AF: 5
    0
    AF Parent
    So, yeah, one thing that’s going on here is that I have recently been explicitly going in the other direction with partial agency, so obviously I somewhat agree. (Both with the object-level anti-realism about the limit of perfect rationality, and with the meta-level claim that agent foundations research may have a mistaken emphasis on this limit.)
    But I also strongly disagree in another way. For example, you lump logical induction into the camp of considering the limit of perfect rationality. And I can definitely see the reason. But from my perspective, the significant contribution of logical induction is absolutely about making rationality more bounded.
    The whole idea of the logical uncertainty problem is to consider agents with limited computational resources.
    Logical induction in particular involves a shift in perspective, where rationality is not an ideal you approach but rather directly about how you improve. Logical induction is about asymptotically approximating coherence in a particular way as opposed to other ways.
    So to a large extent I think my recent direction can be seen as continuing a theme already present—perhaps you might say I’m trying to properly learn the lesson of logical induction.
    But is this theme isolated to logical induction, in contrast to earlier MIRI research? I think not fully: Embedded Agency ties everything together to a very large degree, and embeddedness is about this kind of boundedness to a large degree.
    So I think Agent Foundations is basically not about trying to take the limit of perfect rationality. Rather, we inherited this idea of perfect rationality from Bayesian decision theory, and Agent Foundations is about trying to break it down, approaching it with skepticism and trying to fit it more into the physical world.
    Reflective Oracles still involve infinite computing power, and logical induction still involves massive computing power, more or less because the approach is to start with idealized rationality and try to drag it down to Earth rather than the other way around. (That model feels a bit fake but somewhat useful.)
    (Generally I am disappointed by my reply here. I feel I have not adequately engaged with you, particularly on the function-vs-nature distinction. I may try again later.)
    - Richard_Ngo 20 Jan 2020 2:25 UTC
      LW: 2 AF: 1
      0
      AF Parent
      I’ll try respond properly later this week, but I like the point that embedded agency is about boundedness. Nevertheless, I think we probably disagree about how promising it is “to start with idealized rationality and try to drag it down to Earth rather than the other way around”. If the starting point is incoherent, then this approach doesn’t seem like it’ll go far—if AIXI isn’t useful to study, then probably AIXItl isn’t either (although take this particular example with a grain of salt, since I know almost nothing about AIXItl).
      I appreciate that this isn’t an argument that I’ve made in a thorough or compelling way yet—I’m working on a post which does so.
      - abramdemski 30 Jan 2020 22:35 UTC
        LW: 6 AF: 4
        0
        AF Parent
        If the starting point is incoherent, then this approach doesn’t seem like it’ll go far—if AIXI isn’t useful to study, then probably AIXItl isn’t either (although take this particular example with a grain of salt, since I know almost nothing about AIXItl).
        Hm. I already think the starting point of Bayesian decision theory (which is even “further up” than AIXI in how I am thinking about it) is fairly useful.
        In a naive sort of way, people can handle uncertain gambles by choosing a quantity to treat as ‘utility’ (such as money), quantifying probabilities of outcomes, and taking expected values. This doesn’t always serve very well (e.g. one might prefer Kelley betting), but it was kind of the starting point (probability theory getting its starting point from gambling games) and the idea seems like a useful decision-making mechanism in a lot of situations.
        Perhaps more convincingly, probability theory seems extremely useful, both as a precise tool for statisticians and as a somewhat looser analogy for thinking about everyday life, cognitive biases, etc.
        AIXI adds to all this the idea of quantifying Occam’s razor with algorithmic information theory, which seems to be a very fruitful idea. But I guess this is the sort of thing we’re going to disagree on.
        As for AIXItl, I think it’s sort of taking the wrong approach to “dragging things down to earth”. Logical induction simultaneously makes things computable and solves a new set of interesting problems having to do with accomplishing that. AIXItl feels more like trying to stuff an uncomputable peg into a computable hole.
- Raemon 10 Jan 2020 5:47 UTC
  LW: 6 AF: 3
  0
  AF Parent
  Hmm, I am interested in some debate between you and Daniel Filan (just naming someone who seemed to describe himself as endorsing rationality realism as a crux, although I’m not sure he qualifies as a “miri person”)
  - DanielFilan 13 Jan 2020 0:41 UTC
    LW: 8 AF: 5
    0
    AF Parent
    I believe in some form of rationality realism: that is, that there’s a neat mathematical theory of ideal rationality that’s in practice relevant for how to build rational agents and be rational. I expect there to be a theory of bounded rationality about as mathematically specifiable and neat as electromagnetism (which after all in the real world requires a bunch of materials science to tell you about the permittivity of things).
    If I didn’t believe the above, I’d be less interested in things like AIXI and reflective oracles. In general, the above tells you quite a bit about my ‘worldview’ related to AI.
    Searching for beliefs I hold for which ‘rationality realism’ is crucial by imagining what I’d conclude if I learned that ‘rationality irrealism’ was more right:
    I’d be more interested in empirical understanding of deep learning and less interested in an understanding of learning theory.
    I’d be less interested in probabilistic forecasting of things.
    I’d want to find some higher-level thing that was more ‘real’/mathematically characterisable, and study that instead.
    I’d be less optimistic about the prospects for an ‘ideal’ decision and reasoning theory.
    
    My research depends on the belief that rational agents in the real world are likely to have some kind of ordered internal structure that is comprehensible to people. This belief is informed by rationality realism but distinct from it.
    - abramdemski 13 Jan 2020 12:48 UTC
      2 points
      0
      Parent
      How critical is it that rationality is as real as electromagnetism, rather than as real as reproductive fitness? I think the latter seems much more plausible, but I also don’t see why the distinction should be so cruxy.
      My suspicion is that Rationality Realism would have captured a crux much more closely if the line weren’t “momentum vs reproductive fitness”, but rather, “momentum vs the bystander effect” (ie, physics vs social psychology). Reproductive fitness implies something that’s quite mathematizable, but with relatively “fake” models—e.g., evolutionary models tend to assume perfectly separated generations, perfect mixing for breeding, etc. It would be absurd to model the full details of reality in an evolutionary model, although it’s possible to get closer and closer.
      I think that’s more the sort of thing I expect for theories of agency! I am curious why you expect electromagnetism-esque levels of mathematical modeling. Even AIXI describes a heavy dependence on programming language. Any theory of bounded rationality which doesn’t ignore poly-time differences (ie, anything “closer to the ground” than logical induction) has to be hardware-dependent as well.
      If I didn’t believe the above,
      What alternative world are you imagining, though?
      - DanielFilan 14 Jan 2020 0:58 UTC
        2 points
        0
        Parent
        Meta/summary: I think we’re talking past each other, and hope that this comment clarifies things.
        
        How critical is it that rationality is as real as electromagnetism, rather than as real as reproductive fitness? I think the latter seems much more plausible, but I also don’t see why the distinction should be so cruxy...
        
        Reproductive fitness implies something that’s quite mathematizable, but with relatively “fake” models
        
        I was thinking of the difference between the theory of electromagnetism vs the idea that there’s a reproductive fitness function, but that it’s very hard to realistically mathematise or actually determine what it is. The difference between the theory of electromagnetism and mathematical theories of population genetics (which are quite mathematisable but again deal with ‘fake’ models and inputs, and which I guess is more like what you mean?) is smaller, and if pressed I’m unsure which theory rationality will end up closer to.
        
        Separately, I feel weird having people ask me about why things are ‘cruxy’ when I didn’t initially say that they were and without the context of an underlying disagreement that we’re hashing out. Like, either there’s some misunderstanding going on, or you’re asking me to check all the consequences of a belief that I have compared to a different belief that I could have, which is hard for me to do.
        
        I am curious why you expect electromagnetism-esque levels of mathematical modeling. Even AIXI describes a heavy dependence on programming language. Any theory of bounded rationality which doesn’t ignore poly-time differences (ie, anything “closer to the ground” than logical induction) has to be hardware-dependent as well.
        
        I confess to being quite troubled by AIXI’s language-dependence and the difficulty in getting around it. I do hope that there are ways of mathematically specifying the amount of computation available to a system more precisely than “polynomial in some input”, which should be some input to a good theory of bounded rationality.
        
        If I didn’t believe the above,
        
        What alternative world are you imagining, though?
        
        I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.
        
        abramdemski 17 Jan 2020 20:10 UTC
        11 points
        0
        Parent
        I was thinking of the difference between the theory of electromagnetism vs the idea that there’s a reproductive fitness function, but that it’s very hard to realistically mathematise or actually determine what it is. The difference between the theory of electromagnetism and mathematical theories of population genetics (which are quite mathematisable but again deal with ‘fake’ models and inputs, and which I guess is more like what you mean?) is smaller, and if pressed I’m unsure which theory rationality will end up closer to.
        [Spoiler-boxing the following response not because it’s a spoiler, but because I was typing a response as I was reading your message and the below became less relevant. The end of your message includes exactly the examples I was asking for (I think), but I didn’t want to totally delete my thinking-out-loud in case it gave helpful evidence about my state.]
        I’m having trouble here because yes, the theory of population genetics factors in heavily to what I said, but to me reproductive fitness functions (largely) inherit their realness from the role they play in population genetics. So the two comparisons you give seem not very different to me. The “hard to determine what it is” from the first seems to lead directly to the “fake inputs” from the second.
        So possibly you’re gesturing at a level of realness which is “how real fitness functions would be if there were not a theory of population genetics”? But I’m not sure exactly what to imagine there, so could you give a different example (maybe a few) of something which is that level of real?
        Separately, I feel weird having people ask me about why things are ‘cruxy’ when I didn’t initially say that they were and without the context of an underlying disagreement that we’re hashing out. Like, either there’s some misunderstanding going on, or you’re asking me to check all the consequences of a belief that I have compared to a different belief that I could have, which is hard for me to do.
        Ah, well. I interpreted this earlier statement from you as a statement of cruxiness:
        If I didn’t believe the above, I’d be less interested in things like AIXI and reflective oracles. In general, the above tells you quite a bit about my ‘worldview’ related to AI.
        And furthermore the list following this:
        Searching for beliefs I hold for which ‘rationality realism’ is crucial by imagining what I’d conclude if I learned that ‘rationality irrealism’ was more right:
        So, yeah, I’m asking you about something which you haven’t claimed is a crux of a disagreement which you and I are having, but, I am asking about it because I seem to have a disagreement with you about (a) whether rationality realism is true (pending clarification of what the term means to each of us), and (b) whether rationality realism should make a big difference for several positions you listed.
        I confess to being quite troubled by AIXI’s language-dependence and the difficulty in getting around it. I do hope that there are ways of mathematically specifying the amount of computation available to a system more precisely than “polynomial in some input”, which should be some input to a good theory of bounded rationality.
        Ah, so this points to a real and large disagreement between us about how subjective a theory of rationality should be (which may be somewhat independent of just how real rationality is, but is related).
        I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.
        Ok. Taking this as the rationality irrealism position, I would disagree with it, and also agree that it would make a big difference for the things you said rationality-irrealism would make a big difference for.
        So I now think we have a big disagreement around point “a” (just how real rationality is), but maybe not so much around “b” (what the consequences are for the various bullet points you listed).
        DanielFilan 18 Jan 2020 3:48 UTC
        4 points
        0
        Parent
        
        So, yeah, I’m asking you about something which you haven’t claimed is a crux of a disagreement which you and I are having, but, I am asking about it because I seem to have a disagreement with you about (a) whether rationality realism is true (pending clarification of what the term means to each of us), and (b) whether rationality realism should make a big difference for several positions you listed.
        
        For what it’s worth, from my perspective, two months ago I said I fell into a certain pattern of thinking, then raemon put me in the position of saying what that was a crux for, then I was asked to elaborate about why a specific facet of the distinction was cruxy, and also the pattern of thinking morphed into something more analogous to a proposition. So I’m happy to elaborate on consequences of ‘rationality realism’ in my mind (such as they are—the term seems vague enough that I’m a ‘rationality realism’ anti-realist and so don’t want to lean too heavily on the concept) in order to further a discussion, but in the context of an exchange that was initially framed as a debate I’d like to be clear about what commitments I am and am not making.
        
        Anyway, glad to clarify that we have a big disagreement about how ‘real’ a theory of rationality should be, which probably resolves to a medium-sized disagreement about how ‘real’ rationality and/or its best theory actually is.
        
        Ben Pace 17 Jan 2020 20:23 UTC
        2 points
        0
        Parent
        This is such an interesting use of a spoiler tags. I might try it myself sometime.
  - DanielFilan 10 Jan 2020 6:51 UTC
    LW: 4 AF: 2
    0
    AF Parent
    To answer the easy part of this question/remark, I don’t work at MIRI and don’t research agent foundations, so I think I shouldn’t count as a “MIRI person”, despite having good friends at MIRI and having interned there.
    
    (On a related note, it seems to me that the terminology “MIRI person”/”MIRI cluster” obscures intellectual positions and highlights social connections, which makes me wish that it was less prominent.)
  - Raemon 10 Jan 2020 19:36 UTC
    LW: 2 AF: 1
    0
    AF Parent
    I guess the main thing I want is an actual tally on “how many people definitively found this post to represent their crux”, vs “how many people think that this represented other people’s cruxes”
    - Rohin Shah 12 Jan 2020 20:45 UTC
      LW: 4 AF: 3
      0
      AF Parent
      If I believed realism about rationality, I’d be closer to buying what I see as the MIRI story for impact. It’s hard to say whether I’d actually change my mind without knowing the details of what exactly I’m updating to.
- Vanessa Kosoy 12 Jan 2020 18:52 UTC
  LW: 4 AF: 2
  0
  AF Parent
  I think that ricraz claims that it’s impossible to create a mathematical theory of rationality or intelligence, and that this is a crux, not so? On the other hand, the “momentum vs. fitness” comparison doesn’t make sense to me. Specifically, a concept doesn’t have to be crisply well-defined in order to use it in mathematical models. Even momentum, which is truly one of the “cripser” concepts in science, is no longer well-defined when spacetime is not asymptotically flat (which it isn’t). Much less so are concepts such as “atom”, “fitness” or “demand”. Nevertheless, physicists, biologist and economists continue to successfully construct and apply mathematical models grounded in such fuzzy concepts. Although in some sense I also endorse the “strawman” that rationality is more like momentum than like fitness (at least some aspects of rationality).
  - abramdemski 13 Jan 2020 13:59 UTC
    LW: 5 AF: 4
    0
    AF Parent
    Although in some sense I also endorse the “strawman” that rationality is more like momentum than like fitness (at least some aspects of rationality).
    How so?
    I think that ricraz claims that it’s impossible to create a mathematical theory of rationality or intelligence, and that this is a crux, not so? On the other hand, the “momentum vs. fitness” comparison doesn’t make sense to me.
    Well, it’s not entirely clear. First there is the “realism” claim, which might even be taken in contrast to mathematical abstraction; EG, “is IQ real, or is it just a mathematical abstraction”? But then it is clarified with the momentum vs fitness test, which makes it seem like the question is the degree to which accurate mathematical models can be made (where “accurate” means, at least in part, helpfulness in making real predictions).
    So the idea seems to be that there’s a spectrum with physics at one extreme end. I’m not quite sure what goes at the other extreme end. Here’s one possibility:
    Physics
    Chemistry
    Biology
    Psychology
    Social Sciences
    Humanities
    A problem I have is that (almost) everything on the spectrum is real. Tables and chairs are real, despite not coming with precise mathematical models. So (arguably) one could draw two separate axes, “realness” vs “mathematical modelability”. Well, it’s not clear exactly what that second axis should be.
    Anyway, to the extent that the question is about how mathematically modelable agency is, I do think it makes more sense to expect “reproductive fitness” levels rather than “momentum” levels.
    Hmm, actually, I guess there’s a tricky interpretational issue here, which is what it means to model agency exactly.
    On the one hand, I fully believe in Eliezer’s idea of understanding rationality so precisely that you could make it out of pasta and rubber bands (or whatever). IE, at some point we will be able to build agents from the ground up. This could be seen as an entirely precise mathematical model of rationality.
    But the important thing is a theoretical understanding sufficient to understand the behavior of rational agents in the abstract, such that you could predict in broad strokes what an agent would do before building and running it. This is a very different matter.
    I can see how Ricraz would read statements of the first type as suggesting very strong claims of the second type. I think models of the second type have to be significantly more approximate, however. EG, you cannot be sure of exactly what a learning system will learn in complex problems.
    What links here?
    Plausible cases for HRAD work, and locating the crux in the “realism about rationality” debate by riceissa (22 Jun 2020 1:10 UTC; 85 points)
    - Richard_Ngo 14 Jan 2020 11:01 UTC
      LW: 2 AF: 1
      0
      AF Parent
      Yeah, I should have been much more careful before throwing around words like “real”. See the long comment I just posted for more clarification, and in particular this paragraph:
      I’m not trying to argue that concepts which we can’t formalise “aren’t real”, but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can’t formalise, and that it’s those incoherent extrapolations which “aren’t real” (I agree that this was quite unclear in the original post).
    - Vanessa Kosoy 13 Jan 2020 22:05 UTC
      LW: 0 AF: 2
      0
      AF Parent
      It seems almost tautologically true that you can’t accurately predict what an agent will do without actually running the agent. Because, any algorithm that accurately predicts an agent can itself be regarded as an instance of the same agent.
      
      What I expect the abstract theory of intelligence to do is something like producing a categorization of agents in terms of qualitative properties. Whether that’s closer to “momentum” or “fitness”, I’m not sure the question is even meaningful.
      
      I think the closest analogy is: abstract theory of intelligence is to AI engineering as complexity theory is to algorithmic design. Knowing the complexity class of a problem doesn’t tell you the best practical way to solve it, but it does give you important hints. (For example, if the problem is of exponential time complexity then you can only expect to solve it either for small inputs or in some special cases, and average-case complexity tells you just whether these cases need to be very special or not. If the problem is in $N C$ then you know that it’s possible to gain a lot from parallelization. If the problem is in $N P$ then at least you can test solutions, et cetera.)
      
      And also, abstract theory of alignment should be to AI safety as complexity theory is to cryptography. Once again, many practical considerations are not covered by the abstract theory, but the abstract theory does tell you what kind of guarantees you can expect and when. (For example, in cryptography we can (sort of) know that a certain protocol has theoretical guarantees, but there is engineering work finding a practical implementation and ensuring that the assumptions of the theory hold in the real system.)
      - Shmi 14 Jan 2020 6:13 UTC
        6 points
        0
        Parent
        It seems almost tautologically true that you can’t accurately predict what an agent will do without actually running the agent. Because, any algorithm that accurately predicts an agent can itself be regarded as an instance of the same agent.
        That seems manifestly false. You can figure out whether an algorithm halts or not without being accidentally stuck in an infinite loop. You can look at the recursive Fibonacci algorithm and figure out what it would do without ever running it. So there is a clear distinction between analyzing an algorithm and executing it. If anything, one would know more about the agent by using the techniques from analysis of algorithms than the agent would ever know about themselves.
        TAG 14 Jan 2020 13:58 UTC
        2 points
        0
        Parent
        
        You can figure out whether an algorithm halts or not without being accidentally stuck in an infinite loop.
        
        In special cases, not in the general case.
        
        Vanessa Kosoy 14 Jan 2020 8:18 UTC
        2 points
        0
        Parent
        Of course you can predict some properties of what an agent will do. In particular, I hope that we will eventually have AGI algorithms that satisfy provable safety guarantees. But, you can’t make exact predictions. In fact, there probably is a mathematical law that limits how accurate predictions you can get.
        
        An optimization algorithm is, by definition, something that transforms computational resources into utility. So, if your prediction is so close to the real output that it has similar utility, then it means the way you produced this prediction involved the same product of “optimization power per unit of resources” by “amount of resources invested” (roughly speaking, I don’t claim to already know the correct formalism for this). So you would need to either (i) run a similar algorithm with similar resources or (ii) run a dumber algorithm but with more resources or (iii) use less resources but an even smarter algorithm.
        
        So, if you want to accurately predict the output of a powerful optimization algorithm, your prediction algorithm would usually have to be either a powerful optimization algorithm in itself (cases i and iii) or prohibitively costly to run (case ii). The exception is cases when the optimization problem is easy, so a dumb algorithm can solve it without much resources (or a human can figure out the answer by emself).
- John_Maxwell 11 Jan 2020 3:45 UTC
  LW: 4 AF: 2
  0
  AF Parent
  
  It seems to me like my position, and the MIRI-cluster position, is (1) closer to “rationality is like fitness” than “rationality is like momentum”
  
  Eliezer is a fan of law thinking, right? Doesn’t the law thinker position imply that intelligence can be characterized in a “lawful” way like momentum?
  
  Whereas the non-MIRI cluster is saying “biologists don’t need to know about evolution.”
  
  As a non-MIRI cluster person, I think deconfusion is valuable (insofar as we’re confused), but I’m skeptical of MIRI because they seem more confused than average to me.
  - dxu 13 Jan 2020 23:33 UTC
    2 points
    0
    Parent
    
    Doesn’t the law thinker position imply that intelligence can be characterized in a “lawful” way like momentum?
    
    It depends on what you mean by “lawful”. Right now, the word “lawful” in that sentence is ill-defined, in much the same way as the purported distinction between momentum and fitness. Moreover, most interpretations of the word I can think of describe concepts like reproductive fitness about as well as they do concepts like momentum, so it’s not clear to me why “law thinking” is relevant in the first place—it seems as though it simply muddies the discussion by introducing additional concepts.
    - John_Maxwell 14 Jan 2020 8:07 UTC
      2 points
      0
      Parent
      In my experience, if there are several concepts that seem similar, understanding how they relate to one another usually helps with clarity rather than hurting.
      - dxu 14 Jan 2020 23:35 UTC
        2 points
        0
        Parent
        That depends on how strict your criteria are for evaluating “similarity”. Often concepts that intuitively evoke a similar “feel” can differ in important ways, or even fail to be talking about the same type of thing, much less the same thing.
        
        In any case, how do you feel law thinking (as characterized by Eliezer) relates to the momentum-fitness distinction (as characterized by ricraz)? It may turn out that those two concepts are in fact linked, but in such a case it would nonetheless be helpful to make the linking explicit.
- TAG 8 May 2023 13:39 UTC
  2 points
  0
  Parent
  
  The main problem is the word “realism”. It isn’t clear exactly what it means,
  
  “Fundamentalism” would be a better term for the cluster of problems—dogmatism, literalism and epistemic over-confidence.
Vanessa Kosoy 12 Jan 2020 18:33 UTC
LW: 49 AF: 16
0
AF
In this essay, ricraz argues that we shouldn’t expect a clean mathematical theory of rationality and intelligence to exist. I have debated em about this, and I continue to endorse more or less everything I said in that debate. Here I want to restate some of my (critical) position by building it from the ground up, instead of responding to ricraz point by point.

When should we expect a domain to be “clean” or “messy”? Let’s look at everything we know about science. The “cleanest” domains are mathematics and fundamental physics. There, we have crisply defined concepts and elegant, parsimonious theories. We can then “move up the ladder” from fundamental to emergent phenomena, going through high energy physics, molecular physics, condensed matter physics, biology, geophysics / astrophysics, psychology, sociology, economics… On each level more “mess” appears. Why? Occam’s razor tells us that we should prioritize simple theories over complex theories. But, we shouldn’t expect a theory to be more simple than the specification of the domain. The general theory of planets should be simpler than a detailed description of planet Earth, the general theory of atomic matter should be simpler than the theory of planets, the general theory of everything should be simpler than the theory of atomic matter. That’s because when we’re “moving up the ladder”, we are actually zooming in on particular phenomena, and the information we need to specify “where to zoom in” is translated to the description complexity of theory.

What does it mean in practice about understanding messy domains? The way science solves this problem is by building a tower of knowledge. In this tower, each floor benefits from the interactions both with the floor above it and the floor beneath it. Without understanding macroscopic physics we wouldn’t figure out atomic physics, and without figuring out atomic physics we wouldn’t figure out high energy physics. This is knowledge “flowing down”. But knowledge also “flows up”: knowledge of high energy physics allows understanding particular phenomena in atomic physics, knowledge of atomic physics allows predicting the properties of materials and chemical reactions. (Admittedly, some floors in the tower we have now are rather ramshackle, but I think that ultimately the “tower method” succeeds everywhere, as much as success is possible at all).

How does mathematics come in here? Importantly, mathematics is not used only on the lower floors of the tower, but on all floors. The way “messiness” manifests is, the mathematical models for the higher floors are either less quantitatively accurate (but still contain qualitative inputs) or have a lot of parameters that need to be determined either empirically, or using the models of the lower floors (which is one way how knowledge flows up), or some combination of both. Nevertheless, scientists continue to successfully build and apply mathematical models even in “messy” fields like biology and economics.

So, what does it all mean for rationality and intelligence? On what floor does it sit? In fact, the subject of rationality of intelligence is not a single floor, but its own tower (maybe we should imagine science as a castle with many towers connected by bridges).

The foundation of this tower should be the general abstract theory of rationality. This theory is even more fundamental than fundamental physics, since it describes the principles from which all other knowledge is derived, including fundamental physics. We can regard it as a “theory of everything”: it predicts everything by making those predictions that a rational agent should do. Solomonoff’s theory and AIXI are a part of this foundation, but not all it. Considerations like computational resource constraints should also enter the picture: complexity theory teaches us that they are also fundamental, they don’t requiring “zooming in” a lot.

But, computational resource constrains are only entirely natural when they are not tied to a particular model of computation. This only covers constraints such as “polynomial time” but not constraints such as $O (n^{3})$ time and even less so $245 n^{3}$ time. Therefore, once we introduce a particular model of computation (such as a RAM machine), we need to build another floor in the tower, one that will necessarily be “messier”. Considering even more detailed properties of the hardware we have, the input/output channels we have, the goal system, the physical environment and the software tools we employ will correspond to adding more and more floors.

Once we agree that it shoud be possible to create a clean mathematical theory of rationality and intelligence, we can still debate whether it’s useful. If we consider the problem of creating aligned AGI from an engineering perspective, it might seem for a moment that we don’t really need the bottom layers. After all, when designing an airplane you don’t need high energy physics. Well, high energy physics might help indirectly: perhaps it allowed predicting some exotic condensed matter phenomenon which we used to make a better power source, or better materials from which to build the aircraft. But often we can make do without those.

Such an approach might be fine, except that we also need to remember the risks. Now, safety is part of most engineering, and is definitely a part of airplane design. What level of the tower does it require? It depends on the kind of risks you face. If you’re afraid the aircraft will not handle the stress and break apart, then you need mechanics and aerodynamics. If you’re afraid the fuel will combust and explode, you better know chemistry. If you’re afraid a lightning will strike the aircraft, you need knowledge of meteorology and electromagnetism, possibly plasma physics as well. The relevant domain of knowledge, and the relevant floor in the tower is a function of the nature of the risk.

What level of the tower do we need to understand AI risk? What is the source of AI risk? It is not in any detailed peculiarities of the world we inhabit. It is not in the details of the hardware used by the AI. It is not even related to a particular model of computation. AI risk is the result of Goodhart’s curse, an extremely general property of optimization systems and intelligent agents. Therefore, addressing AI risk requires understanding the general abstract theory of rationality and intelligence. The upper floors will be needed as well, since the technology itself requires the upper floors (and since we’re aligning with humans, who are messy). But, without the lower floors the aircraft will crash.
What links here?
DanielFilan 18 Jan 2020 4:21 UTC
LW: 9 AF: 5
0
AF
I think it was important to have something like this post exist. However, I now think it’s not fit for purpose. In this discussion thread, rohinmshah, abramdemski and I end up spilling a lot of ink about a disagreement that ended up being at least partially because we took ‘realism about rationality’ to mean different things. rohinmshah thought that irrealism would mean that the theory of rationality was about as real as the theory of liberalism, abramdemski thought that irrealism would mean that the theory of rationality would be about as real as the theory of population genetics, and I leaned towards rohinmshah’s position but also thought that it referred to something more akin to a mood than a proposition. I think that a better post would distinguish these three types of ‘realism’ and their consequences. However, I’m glad that this post sparked enough conversation for the better post to become real.
What links here?
- Prizes for Last Year’s 2018 Review by Raemon (2 Dec 2020 11:21 UTC; 72 points)

Viliam 16 Sep 2018 20:51 UTC
48 points
0
I have an intuition that the “realism about rationality” approach will lead to success, even if it will have to be dramatically revised on the way.
To explain, imagine that centuries years ago there are two groups trying to find out how the planets move. Group A says: “Obviously, planets must move according to some simple mathematical rule. The simplest mathematical shape is a circle, therefore planets move in circles. All we have to do is find out the exact diameter of each circle.” Group B says: “No, you guys underestimate the complexity of the real world. The planets, just like everything in nature, can only be approximated by a rule, but there are always exceptions and unpredictability. You will never find a simple mathematical model to describe the movement of the planets.”
The people who finally find out how the planets move will be spiritual descendants of the group A. Even if on the way they will have to add epicycles, and then discard the idea of circles, which seems like total failure of the original group. The problem with the group B is that it has no energy to move forward.
The right moment to discard a simple model is when you have enough data to build a more complex model.
- Richard_Ngo 16 Sep 2018 21:11 UTC
  50 points
  0
  Parent
  The people who finally find out how the planets move will be spiritual descendants of the group A. … The problem with the group B is that it has no energy to move forward.
  In this particular example, it’s true that group A was more correct. This is because planetary physics can be formalised relatively easily, and also because it’s a field where you can only observe and not experiment. But imagine the same conversation between sociologists who are trying to find out what makes people happy, or between venture capitalists trying to find out what makes startups succeed. In those cases, Group B can move forward using the sort of “energy” that biologists and inventors and entrepreneurs have, driven an experimental and empirical mindset. Whereas Group A might spend a long time writing increasingly elegant equations which rely on unjustified simplifications.
  Instinctively reasoning about intelligence using analogies from physics instead of the other domains I mentioned above is a very good example of rationality realism.
  What links here?
  - sunwillrise's comment on Agent foundations: not really math, not really science by Alex_Altair (17 Aug 2025 13:18 UTC; 51 points)
  - jamii 20 Sep 2018 11:15 UTC
    6 points
    0
    Parent
    Uncontrolled argues along similar lines—that the physics/chemistry model of science, where we get to generalize a compact universal theory from a number of small experiments, is simply not applicable to biology/psychology/sociology/economics and that policy-makers should instead rely more on widespread, continuous experiments in real environments to generate many localized partial theories.
    A prototypical argument is the paradox-of-choice jam experiment, which has since become solidified in pop psychology. But actual supermarkets run many 1000s of in-situ experiments and find that it actually depends on the product, the nature of the choices, the location of the supermarket, the time of year etc.
    What links here?
    sunwillrise's comment on Agent foundations: not really math, not really science by Alex_Altair (17 Aug 2025 13:18 UTC; 51 points)
    - Rob Bensinger 20 Sep 2018 20:33 UTC
      20 points
      0
      Parent
      Uncontrolled argues along similar lines—that the physics/chemistry model of science, where we get to generalize a compact universal theory from a number of small experiments, is simply not applicable to biology/psychology/sociology/economics and that policy-makers should instead rely more on widespread, continuous experiments in real environments to generate many localized partial theories.
      I’ll note that (non-extreme) versions of this position are consistent with ideas like “it’s possible to build non-opaque AGI systems.” The full answer to “how do birds work?” is incredibly complex, hard to formalize, and dependent on surprisingly detailed local conditions that need to be discovered empirically. But you don’t need to understand much of that complexity at all to build flying machines with superavian speed or carrying capacity, or to come up with useful theory and metrics for evaluating “goodness of flying” for various practical purposes; and the resultant machines can be a lot simpler and more reliable than a bird, rather than being “different from birds but equally opaque in their own alien way”.
      This isn’t meant to be a response to the entire “rationality non-realism” suite of ideas, or a strong argument that AGI developers can steer toward less opaque systems than AlphaZero; it’s just me noting a particular distinction that I particularly care about.
      The relevant realism-v.-antirealism disagreement won’t be about “can machines serve particular functions more transparently than biological organs that happen to serve a similar function (alongside many other functions)?”. In terms of the airplane analogy, I expect disagreements like “how much can marginal effort today increase transparency once we learn how to build airplanes?”, “how much useful understanding are we currently missing about how airplanes work?”, and “how much of that understanding will we develop by default on the path toward building airplanes?”.
  - binary_doge 1 Oct 2018 16:01 UTC
    4 points
    0
    Parent
    “This is because planetary physics can be formalized relatively easily” - they can now, and could when they were, but not before. One can argue that we thought many “complex” and very “human” abilities could not be algroithmically emulated in the past, and recent advances in AI (with neural nets and all that) have proven otherwise. If a program can do/predict something, there is a set of mechanical rules that explain it. The set might not be as elegant as Newton’s laws of motion, but it is still a set of equations nonetheless. The idea behind Villam’s comment (I think) is that in the future someone might say, the same way you just did, that “We can formalize how happy people generally are in a given society because that’s relatively easy, but what about something truly complex like what an individual might imagine if we read him a specific story?”.
    In other words, I don’t see the essential differentiation between biology and sociology questions and physics questions, that you try to point to. In the post itself you also talk about moral preference, and I tend to agree with you that some people just have very individually strongly valued axioms that might contradict themselves or others, but it doesn’t in itself mean that questions about rationality differ from questions about, say, molecular biology, in the sense that they can be hypothetically answered to a satisfactory level of accuracy.
  - DragonGod 30 Sep 2018 19:56 UTC
    −2 points
    0
    Parent
    Group A was most successful in the field of computation, so I have high confidence that their approach would be successful in intelligence as well (especially in intelligence of artificial agents).
- drossbucket 17 Sep 2018 5:51 UTC
  6 points
  0
  Parent
  This is the most compelling argument I’ve been able to think of too when I’ve tried before. Feynman has a nice analogue of it within physics in The Character of Physical Law:
  … it would have been no use if Newton had simply said, ‘I now understand the planets’, and for later men to try to compare it with the earth’s pull on the moon, and for later men to say ‘Maybe what holds the galaxies together is gravitation’. We must try that. You could say ‘When you get to the size of the galaxies, since you know nothing about it, anything can happen’. I know, but there is no science in accepting this type of limitation.
  I don’t think it goes through well in this case, for the reasons ricraz outlines in their reply. Group B already has plenty of energy to move forward, from taking our current qualitative understanding and trying to build more compelling explanatory models and find new experimental tests. It’s Group A that seems rather mired in equations that don’t easily connect.
  Edit: I see I wrote about something similar before, in a rather rambling way.
- TAG 25 Feb 2022 20:10 UTC
  1 point
  0
  Parent
  That isn’t analogous to rationalism versus the mainstream. The mainstream has already developed more complex models...it’s rationalism that’s saying, “no , just use Bayes for everything” (etc).
abramdemski 24 Sep 2018 18:38 UTC
41 points
0
Rationality realism seems like a good thing to point out which might be a crux for a lot of people, but it doesn’t seem to be a crux for me.
I don’t think there’s a true rationality out there in the world, or a true decision theory out there in the world, or even a true notion of intelligence out there in the world. I work on agent foundations because there’s still something I’m confused about even after that, and furthermore, AI safety work seems fairly hopeless while still so radically confused about the-phenomena-which-we-use-intelligence-and-rationality-and-agency-and-decision-theory-to-describe. And, as you say, “from a historical point of view I’m quite optimistic about using maths to describe things in general”.
What links here?
romeostevensit 16 Sep 2018 23:04 UTC
34 points
0
I really like the compression “There’s no canonical way to scale me up.”
I think it captures a lot of the important intuitions here.
What links here?
- Ben Pace 17 Sep 2018 0:51 UTC
  1 point
  0
  Parent
  +1
  - Ben Pace 17 Sep 2018 2:38 UTC
    24 points
    0
    Parent
    I think I want to split up ricraz’s examples in the post into two subclasses, defined by two questions.
    The first asks, given that there are many different AGI architectures one could scale up into, are some better than others? (My intuition is both that there are better ones than others, and also that there are many who are on the pareto frontier.) And is there any sort of simple ways to determine about why one is better than another? This leads to saying the following examples from the OP:
    There is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general; there is an “ideal” decision theory; the idea that AGI will very likely be an “agent”; the idea that Turing machines and Kolmogorov complexity are foundational for epistemology; the idea that morality is quite like mathematics, in that there are certain types of moral reasoning that are just correct.
    The second asks—suppose that some architectures are better than others, and suppose there are some simple explanations about why some are better than others. How practical is it to talk of me in this way today? Here’s some concrete examples of things I might do:
    Given certain evidence for a proposition, there’s an “objective” level of subjective credence which you should assign to it, even under computational constraints; the idea that Aumann’s agreement theorem is relevant to humans; the idea that defining coherent extrapolated volition in terms of an idealised process of reflection roughly makes sense, and that it converges in a way which doesn’t depend very much on morally arbitrary factors; the idea that having having contradictory preferences or beliefs is really bad, even when there’s no clear way that they’ll lead to bad consequences (and you’re very good at avoiding dutch books and money pumps and so on).
    If I am to point to two examples that feel very concrete to me, I might ask:
    Is the reasoning that Harry is doing in Chapter 86: Multiple Hypothesis Testing useful or totally insane?
    When one person says “I guess we’ll have to agree to disagree” and the second person says “Actually according to Aumann’s Agreement Theorem, we can’t” is the second person making a type error?
    Certainly the first person is likely mistaken if they’re saying “In principle no exchange of evidence could cause us to agree”, but perhaps the second person is also mistaken, in implying that it makes any sense to model their disagreement in terms of idealised, scaled-up, rational agents rather than the weird bag of meat and neuroscience that we actually are—for which Aumann’s Agreement Theorem certainly has not been proven.
    To be clear: the two classes of examples come from roughly the same generator, and advances in our understanding of one can lead to advances in the other. I just often draw from fairly different reference classes of evidence for updating on them (examples: For the former, Jaynes, Shannon, Feynman. For the latter, Kahneman & Tversky and Tooby & Cosmides).
    - Benquo 17 Sep 2018 12:17 UTC
      23 points
      0
      Parent
      When one person says “I guess we’ll have to agree to disagree” and the second person says “Actually according to Aumann’s Agreement Theorem, we can’t” is the second person making a type error?
      Making a type error is not easy to distinguish from attempting to shift frame. (If it were, the frame control wouldn’t be very effective.) In the example Eliezer gave from the sequences, he was shifting frame from one that implicitly acknowledges interpretive labor as a cost, to one that demands unlimited amounts of interpretive labor by assuming that we’re all perfect Bayesians (and therefore have unlimited computational ability, memory, etc).
      This is a big part of the dynamic underlying mistake vs conflict theory.
      - Benquo 17 Sep 2018 12:18 UTC
        16 points
        0
        Parent
        Eliezer’s behavior in the story you’re alluding to only seems “rational” insofar as we think the other side ends up with a better opinion—I can easily imagine a structurally identical interaction where the protagonist manipulates someone into giving up on a genuine but hard to articulate objection, or proceeding down a conversational path they’re ill-equipped to navigate, thus “closing the sale.”
        gjm 18 Sep 2018 11:52 UTC
        12 points
        0
        Parent
        It’s not at all clear that improving the other person’s opinion was really one of Eliezer’s goals on this occasion, as opposed to showing up the other person’s intellectual inferiority. He called the post “Bayesian Judo”, and highlighted how his showing-off impressed someone of the opposite sex.
        
        He does also suggest that in the end he and the other person came to some sort of agreement—but it seems pretty clear that the thing they agreed on had little to do with the claim the other guy had originally been making, and that the other guy’s opinion on that didn’t actually change. So I think an accurate, though arguably unkind, summary of “Bayesian Judo” goes like this: “I was at a party, I got into an argument with a religious guy who didn’t believe AI was possible, I overwhelmed him with my superior knowledge and intelligence, he submitted to my manifest superiority, and the whole performance impressed a woman”. On this occasion, helping the other party to have better opinions doesn’t seem to have been a high priority.
    - Said Achmiz 17 Sep 2018 3:47 UTC
      22 points
      0
      Parent
      
      When one person says “I guess we’ll have to agree to disagree” and the second person says “Actually according to Aumann’s Agreement Theorem, we can’t” is the second person making a type error?
      
      Note: I confess to being a bit surprised that you picked this example. I’m not quite sure whether you picked a bad example for your point (possible) or whether I’m misunderstanding your point (also possible), but I do think that this question is interesting all on its own, so I’m going to try and answer it.
      
      Here’s a joke that you’ve surely heard before—or have you?
      
      Three mathematicians walk into a bar. The bartender asks them, “Do you all want a beer?”
      
      The first mathematician says, “I don’t know.”
      
      The second mathematician says, “I don’t know.”
      
      The third mathematician says, “I don’t know.”
      
      The lesson of this joke applies to the “according to Aumann’s Agreement Theorem …” case.
      
      When someone says “I guess we’ll have to agree to disagree” and their interlocutor responds with “Actually according to Aumann’s Agreement Theorem, we can’t”, I don’t know if I’d call this a “type error”, precisely (maybe it is; I’d have to think about it more carefully); but the second person is certainly being ridiculous. And if I were the first person in such a situation, my response might be something along these lines:
      
      “Really? We can’t? We can’t what, exactly? For example, I could turn around and walk away. Right? Surely, the AAT doesn’t say that I will be physically unable to do that? Or does it, perhaps, say that either you or I or both of us will be incapable of interacting amicably henceforth, and conversing about all sorts of topics other than this one? But if not, then what on Earth could you have meant by your comment?
      
      “I mean… just what, exactly, did you think I meant, when I suggested that we agree to disagree? Did you take me to be claiming that (a) the both of us are ideal Bayesian reasoners, and (b) we have common knowledge of our posterior probabilities of the clearly expressible proposition the truth of which we are discussing, but (c) our posterior probabilities, after learning this, should nonetheless differ? Is that what you thought I was saying? Really? But why? Why in the world did you interpret my words in such a bizarrely technical way? What would you say is your estimate of the probability that I actually meant to make that specific, precisely technical statement?”
      
      And so on. The questions are rhetorical, of course. Anyone with half an ounce of common sense (not to mention anyone with an actual understanding of the AAT!) understands perfectly well that the Theorem is totally inapplicable to such cases.
      
      (Of course, in some sense this is all moot. The one who says “actually, according to the AAT…” doesn’t really think that his interlocutor meant all of that. He’s not really making any kind of error… except, possibly, a tactical one—but perhaps not even that.)
      
      What links here?
      Said Achmiz's comment on Beware of small world puzzles by mukashi (30 Aug 2021 17:39 UTC; 9 points)
      - Ben Pace 17 Sep 2018 5:44 UTC
        17 points
        0
        Parent
        Firstly, I hadn’t heard the joke before, and it made me chuckle to myself.
        Secondly, I loved this comment, for very accurately conveying the perspective I felt like ricraz was trying to defend wrt realism about rationality.
        Let me say two (more) things in response:
        Firstly, I was taking the example directly from Eliezer.
        I said, “So if I make an Artificial Intelligence that, without being deliberately preprogrammed with any sort of script, starts talking about an emotional life that sounds like ours, that means your religion is wrong.”
        He said, “Well, um, I guess we may have to agree to disagree on this.”
        I said: “No, we can’t, actually. There’s a theorem of rationality called Aumann’s Agreement Theorem which shows that no two rationalists can agree to disagree. If two people disagree with each other, at least one of them must be doing something wrong.”
        (Sidenote: I have not yet become sufficiently un-confused about AAT to have a definite opinion about whether EY was using it correctly there. I do expect after further reflection to object to most rationalist uses of the AAT but not this particular one.)
        Secondly, and where I think the crux of this matter lies, is that I believe your (quite understandable!) objection applies to most attempts to use bayesian reasoning in the real world.
        Suppose one person is trying to ignore a small piece of evidence against a cherished position, and a second person says to the them “I know you’ve ignored this piece of evidence, but you can’t do that because it is Bayesian evidence—it is the case that you’re more likely to see this occur in worlds where your belief is false than in worlds where it’s true, so the correct epistemic move here is to slightly update against your current belief.”
        If I may clumsily attempt to wrangle your example to my own ends, might they not then say:
        “I mean… just what, exactly, did you think I meant, when I said this wasn’t any evidence at all? Did you take me to be claiming that (a) I am an ideal Bayesian reasoner, and (b) I have observed evidence that occurs in more worlds where my belief is true than if it is false, but (c) my posterior probability, after learning this, should still equal my prior probability? Is that what you thought I was saying? Really? But why? Why in the world did you interpret my words in such a bizarrely technical way? What would you say is your estimate that I actually meant to make that specific, precisely technical statement?”
        and further
        I am not a rational agent. I am a human, and my mind does not satisfy the axioms of probability theory; therefore it is nonsensical to attempt to have me conform my speech patterns and actions to these logical formalisms.
        Bayes’ theorem applies if your beliefs update according to very strict axioms, but it’s not at all obvious to me that the weird fleshy thing in my head currently conforms to those axioms. Should I nonetheless try to? And if so, why shouldn’t I for AAT?
        Aumann’s Agreement Theorem is true if we are rational (bayesian) agents. There a large other number of theorems that apply to rational agents too, and it seems that sometimes people want to use these abstract formalisms to guide behaviour and sometimes not, and having a principled stance here about when and when not to use them seems useful and important.
        Said Achmiz 17 Sep 2018 6:36 UTC
        15 points
        0
        Parent
        Well, I guess you probably won’t be surprised to hear that I’m very familiar with that particular post of Eliezer’s, and instantly thought of it when I read your example. So, consider my commentary with that in mind!
        
        (Sidenote: I have not yet become sufficiently un-confused about AAT to have a definite opinion about whether EY was using it correctly there. I do expect after further reflection to object to most rationalist uses of the AAT but not this particular one.)
        
        Well, whether Eliezer was using the AAT correctly rather depends on what he meant by “rationalist”. Was he using it as a synonym for “perfect Bayesian reasoner”? (Not an implausible reading, given his insistence elsewhere on the term “aspiring rationalist” for mere mortals like us, and, indeed, like himself.) If so, then certainly what he said about the Theorem was true… but then, of course, it would be wholly inappropriate to apply it in the actual case at hand (especially since his interlocutor was, I surmise, some sort of religious person, and plausibly not even an aspiring rationalist).
        
        If, instead, Eliezer was using “rationalist” to refer to mere actual humans of today, such as himself and the fellow he was conversing with, then his description of the AAT was simply inaccurate.
        
        Secondly, and where I think the crux of this matter lies, is that I believe your (quite understandable!) objection applies to most attempts to use bayesian reasoning in the real world.
        
        Indeed not. The critical point is this: there is a difference between trying to use Bayesian reasoning and intepreting people’s comments to refer to Bayesian reasoning. Whether you do the former is between you and your intellectual conscience, so to speak. Whether you do the latter, on the other hand, is a matter of both pragmatics (is this any kind of a good idea?) and of factual accuracy (are you correctly understand what someone is saying?).
        
        So the problem with your example, and with your point, is the equivocation between two questions:
        
        “I’m not a perfect Bayesian reasoner, but shouldn’t I try to be?” (And the third-person variant, which is isomorphic to the first-person variant to whatever degree your goals and that of your advisee/victim are aligned.)
        
        “My interlocutor is not speaking with the assumption that we’re perfect Bayesian reasoners, nor is he referring to agreement or belief or anything else in any kind of a strict, technical, Bayesian sense, but shouldn’t I assume that he is, thus ascribing meaning to his words that is totally different than his intended meaning?”
        
        The answer to the first question is somewhere between “Uh, sure, why not, I guess? That’s your business, anyway” and “Yes, totally do that! Tsuyoku naritai, and all that!”.
        
        The answer to the second question is “No, that is obviously a terrible idea. Never do that.”
      - Anon User 30 Aug 2021 21:13 UTC
        0 points
        0
        Parent
        Actually, there is a logical error in your mathematicians joke—at least compared to how this joke normally goes. When it’s their turn, the 3rd mathematician knows that the first two wanted a beer (otherwise they would have said “yes”), and so can say Yes/No. https://www.beingamathematician.org/Jokes/445-three-logicians-walk-into-a-bar.png
        Said Achmiz 30 Aug 2021 23:07 UTC
        3 points
        0
        Parent
        You have entirely missed the point I was making in that comment.
        
        Of course I am aware of the standard form of the joke. I presented my modified form of the joke in the linked comment, as a deliberate contrast with the standard form, to illustrate the point I was making.
    - c0rw1n 17 Sep 2018 4:10 UTC
      2 points
      0
      Parent
      Aumann’s agreement theorem says that two people acting rationally (in a certain precise sense) and with common knowledge of each other’s beliefs cannot agree to disagree. More specifically, if two people are genuine Bayesian rationalists with common priors, and if they each have common knowledge of their individual posterior probabilities, then their posteriors must be equal.
      
      With common priors.
      
      This is what does all the work there! If the disagreeers have non-equal priors on one of the points, then of course they’ll have different posteriors.
      
      Of course applying Bayes’ Theorem with the same inputs is going to give the same outputs, that’s not even a theorem, that’s an equals sign.
      
      If the disagreeers find a different set of parameters to be relevant, and/or the parameters they both find relevant do not have the same values, the outputs will differ, and they will continue to disagree.
      - Benquo 17 Sep 2018 12:21 UTC
        22 points
        0
        Parent
        Relevant: Why Common Priors
Wei Dai 19 Aug 2019 4:02 UTC
33 points
0
Just want to note that I’ve been pushing for (what I think is) a proper amount of uncertainty about “realism about rationality” for a long time. Here’s a collection of quotes from just my top-level posts, arguing against various items in your list:

Is this realistic for human rationalist wannabes? It seems wildly implausible to me that two humans can communicate all of the information they have that is relevant to the truth of some statement just by repeatedly exchanging degrees of belief about it, except in very simple situations. You need to know the other agent’s information partition exactly in order to narrow down which element of the information partition he is in from his probability declaration, and he needs to know that you know so that he can deduce what inference you’re making, in order to continue to the next step, and so on. One error in this process and the whole thing falls apart. It seems much easier to just tell each other what information the two of you have directly.

Finally, I now see that until the exchange of information completes and common knowledge/agreement is actually achieved, it’s rational for even honest truth-seekers who share common priors to disagree. Therefore, two such rationalists may persistently disagree just because the amount of information they would have to exchange in order to reach agreement is too great to be practical.

-- Probability Space & Aumann Agreement

Considering the agent as a whole suggests that the master’s values are the true terminal values, and the slave’s values are merely instrumental values. From this perspective, the slave seems to be just a subroutine that the master uses to carry out its wishes. Certainly in any given mind there will be numerous subroutines that are tasked with accomplishing various subgoals, and if we were to look at a subroutine in isolation, its assigned subgoal would appear to be its terminal value, but we wouldn’t consider that subgoal to be part of the mind’s true preferences. Why should we treat the slave in this model differently?

-- A Master-Slave Model of Human Preferences

What ethical principles can we use to decide between “Shut Up and Multiply” and “Shut Up and Divide”? Why should we derive our values from our native emotional responses to seeing individual suffering, and not from the equally human paucity of response at seeing large portions of humanity suffer in aggregate? Or should we just keep our scope insensitivity, like our boredom?

And an interesting meta-question arises here as well: how much of what we think our values are, is actually the result of not thinking things through, and not realizing the implications and symmetries that exist? And if many of our values are just the result of cognitive errors or limitations, have we lived with them long enough that they’ve become an essential part of us?

-- Shut Up and Divide?

By the way, I think nihilism often gets short changed around here. Given that we do not actually have at hand a solution to ontological crises in general or to the specific crisis that we face, what’s wrong with saying that the solution set may just be null? Given that evolution doesn’t constitute a particularly benevolent and farsighted designer, perhaps we may not be able to do much better than that poor spare-change collecting robot? If Eliezer is worried that actual AIs facing actual ontological crises could do worse than just crash, should we be very sanguine that for humans everything must “add up to moral normality”?

-- Ontological Crisis in Humans

Without being a system of logic, moral philosophical reasoning likely (or at least plausibly) doesn’t have any of the nice properties that a well-constructed system of logic would have, for example, consistency, validity, soundness, or even the more basic property that considering arguments in a different order, or in a different mood, won’t cause a person to accept an entirely different set of conclusions. For all we know, somebody trying to reason about a moral concept like “fairness” may just be taking a random walk as they move from one conclusion to another based on moral arguments they encounter or think up.

-- Morality Isn’t Logical
1. There aren’t any normative facts at all, including facts about what is rational. For example, it turns out there is no one decision theory that does better than every other decision theory in every situation, and there is no obvious or widely-agreed-upon way to determine which one “wins” overall.
-- Six Plausible Meta-Ethical Alternatives
Vanessa Kosoy 18 Sep 2018 11:17 UTC
LW: 32 AF: 9
0
AF
Although I don’t necessarily subscribe to the precise set of claims characterized as “realism about rationality”, I do think this broad mindset is mostly correct, and the objections outlined in this essay are mostly wrong.

There’s a key difference between the first two, though. Momentum is very amenable to formalisation: we can describe it using precise equations, and even prove things about it. Evolutionary fitness is the opposite: although nothing in biology makes sense without it, no biologist can take an organism and write down a simple equation to define its fitness in terms of more basic traits. This isn’t just because biologists haven’t figured out that equation yet. Rather, we have excellent reasons to think that fitness is an incredibly complicated “function” which basically requires you to describe that organism’s entire phenotype, genotype and environment.

This seems entirely wrong to me. Evolution definitely should be studied using mathematical models, and although I am not an expert in that, AFAIK this approach is fairly standard. “Fitness” just refers to the expected behavior of the number of descendants of a given organism or gene. Therefore, it is perfectly definable modulo the concept of a “descendant”. The latter is not as unambiguously defined as “momentum” but under normal conditions it is quite precise. The actual structure and dynamics of biological organisms and their environment is very complicated, but this does not preclude the abstract study of evolution, i.e. understanding which sort of dynamics are possible in principle (for general environments) and in which way they depend on the environment etc. Applying this knowledge to real-life evolution is not trivial (and it does require a lot of complementary empirical research), as is the application of theoretical knowledge in any domain to “messy” real-life examples, but that doesn’t mean such knowledge is useless. On the contrary, such knowledge is often essential to progress.

In a nutshell, then, realism about rationality is a mindset in which reasoning and intelligence are more like momentum than like fitness. It’s a mindset which makes the following ideas seem natural: The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general. (I don’t count brute force approaches like AIXI for the same reason I don’t consider physics a simple yet powerful description of biology)...

I wonder whether the OP also doesn’t count all of computational learning theory? Also, physics is definitely not a sufficient description of biology but on the other hand, physics is still very useful for understanding biology. Indeed, it’s hard to imagine we would achieve the modern level of understanding chemistry without understanding at least non-relativistic quantum mechanics, and it’s hard to imagine we would make much headway in molecular biology without chemistry, thermodynamics et cetera.

This essay is primarily intended to explain my position, not justify it, but one important consideration for me is that intelligence as implemented in humans and animals is very messy, and so are our concepts and inferences, and so is the closest replica we have so far (intelligence in neural networks).

Once again, the OP uses the concept of “messiness” in a rather ambiguous way. It is true that human and animal intelligence is “messy” in the sense that brains are complex and many of the fine details of their behavior are artifacts of either fine details in limitations of biological computational hardware, or fine details in the natural environment, or plain evolutionary accidents. However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence. This is because the latter theory aims to describe mindspace as a whole rather than describing a particular rather arbitrary point inside it.

The disagreement here seems to revolve around the question of, when should we expect to have a simple theory for a given phenomenon (i.e. when does Occam’s razor apply)? It seems clear that we should expect to have a simple theory of e.g. fundumental physics, but not a simple equation for the coastline of Africa. The difference is, physics is a unique object that has a fundumental role, whereas Africa is just one arbitrary continent among the set of all continents on all planets in the universe throughout its lifetime and all Everett branches. Therefore, we don’t expect a simple description of Africa, but we do expect a relatively simple description of planetary physics that would tell us which continent shapes are possible and which are more likely.

Now, “rationality” and “intelligence” are in some sense even more fundumental than physics. Indeed, rationality is what tells us how to form correct beliefs, i.e. how to find the correct theory of physics. Looking an anthropic paradoxes, it is even arguable that making decisions is even more fundumental than forming beliefs (since anthropic paradoxes are situations in which assigning subjective probabilities seems meaningless but the correct decision is still well-defined via “functional decision theory” or something similar). Therefore, it seems like there has to be a simple theory of intelligence, even if specific instances of intelligence are complex by virtue of their adaptation to specific computational hardware, specific utility function (or maybe some more general concept of “values”), somewhat specific (although still fairly diverse) class of environments, and also by virtue of arbitrary flaws in their design (that are still mild enough to allow for intelligent behavior).

Another way of pointing at rationality realism: suppose we model humans as internally-consistent agents with beliefs and goals. This model is obviously flawed, but also predictively powerful on the level of our everyday lives. When we use this model to extrapolate much further (e.g. imagining a much smarter agent with the same beliefs and goals), or base morality on this model (e.g. preference utilitarianism, CEV), is that more like using Newtonian physics to approximate relativity (works well, breaks down in edge cases) or more like cavemen using their physics intuitions to reason about space (a fundamentally flawed approach)?

This line of thought would benefit from more clearly delineating descriptive versus prescriptive. The question we are trying to answer is: “if we build a powerful goal-oriented agent, what goal system should we give it?” That is, it is fundamentally a prescriptive rather than descriptive question. It seems rather clear that the best choice of goal system would be in some sense similar to “human goals”. Moreover, it seems that if possibilities A and B are such that is ill-defined whether humans (or at least, those humans that determine the goal system of the powerful agent) prefer A or B, then there is no moral significance to choosing between A and B in the target goal system. Therefore, we only need to determine “human goals” within the precision to which they are actually well-defined, not within absolute precision.

Evolution gave us a jumble of intuitions, which might contradict when we extrapolate them. So it’s fine to accept that our moral preferences may contain some contradictions.

The question is not whether it is “fine”. The question is, given a situation in which intuition A demands action X and intuition B demands action Y, what is the morally correct action? The answer might be “X”, it might be “Y”, it might be “both actions are equally good”, or it might be even “Z” for some Z different from both X and Y. But any answer effectively determines a way to remove the contradiction, replacing it by a consistent overarching system. And, if we actually face that situation, we need to actually choose an answer.
What links here?
- cousin_it 18 Sep 2018 12:26 UTC
  LW: 11 AF: 4
  0
  AF Parent
  
  It is true that human and animal intelligence is “messy” in the sense that brains are complex and many of the fine details of their behavior are artifacts of either fine details in limitations of biological computational hardware, or fine details in the natural environment, or plain evolutionary accidents. However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence.
  
  I used to think the same way, but the OP made me have a crisis of faith, and now I think the opposite way.
  
  Sure, an animal brain solving an animal problem is messy. But a general purpose computer solving a simple mathematical problem can be just as messy. The algorithm for multiplying matrices in O(n^2.8) is more complex than the algorithm for doing it in O(n^3), and the algorithm with O(n^2.4) is way more complex than that. As I said in the other comment, “algorithms don’t get simpler as they get better”.
  - Vanessa Kosoy 18 Sep 2018 13:06 UTC
    LW: 18 AF: 5
    0
    AF Parent
    I don’t know a lot about the study of matrix multiplication complexity, but I think that one of the following two possibilities is likely to be true:
    
    There is some $ω \in R$ and an algorithm for matrix multiplication of complexity $O (n^{ω + ϵ})$ for any $ϵ > 0$ s.t. no algorithm of complexity $O (n^{ω - ϵ})$ exists (AFAIK, the prevailing conjecture is $ω = 2$ ). This algorithm is simple enough for human mathematicians to find it, understand it and analyze its computational complexity. Moreover, there is a mathematical proof of its optimality that is simple enough for human mathematicians to find and understand.
    There is a progression of algorithms for lower and lower exponents that increases in description complexity without bound as the exponent approaches $ω$ from above, and the problem of computing a program with given exponent is computationally intractable or even uncomputable. This fact has a mathematical proof that is simple enough for human mathematicians to find and understand.
    
    Moreover, if we only care about having a polynomial time algorithm with some exponent then the solution is simple (and doesn’t require any astronomical coefficients like Levin search; incidentally, the $O (n^{3})$ algorithm is also good enough for most real world applications). In either case, the computational complexity of matrix multiplication is understandable in the sense I expect intelligence to be understandable.
    
    So, it is possible that there is a relatively simple and effective algorithm for intelligence (although I still expect a lot of “messy” tweaking to get a good algorithm for any specific hardware architecture; indeed, computational complexity is only defined up to a polynomial if you don’t specify a model of computation), or it is possible that there is a progression of increasingly complex and powerful algorithms that are very expensive to find. In the latter case, long AGI timelines become much more probable since biological evolution invested an enormous amount of resources in the search which we cannot easily match. In either case, there should be a theory that (i) defines what intelligence is (ii) predicts how intelligence depends on parameters such as description complexity and computational complexity.
    - cousin_it 18 Sep 2018 14:06 UTC
      LW: -1 AF: 2
      0
      AF Parent
      A good algorithm can be easy to find, but not simple in the other senses of the word. Machine learning can output an algorithm that seems to perform well, but has a long description and is hard to prove stuff about. The same is true for human intelligence. So we might not be able to find an algorithm that’s as strong as human intelligence but easier to prove stuff about.
      - Vanessa Kosoy 18 Sep 2018 14:52 UTC
        LW: 16 AF: 5
        0
        AF Parent
        Machine learning uses data samples about an unknown phenomenon to extrapolate and predict the phenomenon in new instances. Such algorithms can have provable guarantees regarding the quality of the generalization: this is exactly what computational learning theory is about. Deep learning is currently poorly understood, but this seems more like a result of how young the field is, rather than some inherent mysteriousness of neural networks. And even so, there is already some progress. People have been making buildings and cannons before Newtonian mechanics, engines before thermodynamics and ways of using chemical reactions before quantum mechanics or modern atomic theory. The fact you can do something using trial and error doesn’t mean trial and error is the only way to do it.
        
        cousin_it 18 Sep 2018 19:36 UTC
        LW: 1 AF: 1
        0
        AF Parent
        
        Deep learning is currently poorly understood, but this seems more like a result of how young the field is, rather than some inherent mysteriousness of neural networks.
        
        I think “inherent mysteriousness” is also possible. Some complex things are intractable to prove stuff about.
  - DragonGod 30 Sep 2018 20:26 UTC
    2 points
    0
    Parent
    I don’t see why better algorithms being more complex is a problem?
- DragonGod 30 Sep 2018 20:24 UTC
  LW: 3 AF: 2
  0
  AF Parent
  I disagree that intelligence and rationality are more fundamental than physics; the territory itself is physics, and that is all that is really there. Everything else (including the body of our phone knowledge) are models for navigating that territory.
  
  Turing formalised computation and established the limits of computation given certain assumptions. However, those limits only apply as long as the assumptions are true. Turing did not prove that no mechanical system is superior to a Universal Turing Machine, and weird physics may enable super Turing computation.
  
  The point I was making is that our models are only as good as their correlation with the territory. The abstract models we have aren’t part of the territory itself.
  - Vanessa Kosoy 1 Oct 2018 20:55 UTC
    LW: 11 AF: 4
    0
    AF Parent
    Physics is not the territory, physics is (quite explicitly) the models we have of the territory. Rationality consists of the rules for formulating these models, and in this sense it is prior to physics and more fundumental. (This might be a disagreement over use of words. If by “physics” you, by definition, refer to the territory, then it seems to miss my point about Occam’s razor. Occam’s razor says that the map should be parsimonious, not the territory: the latter would be a type error.) In fact, we can adopt the view that Solomonoff induction (which is a model of rationality) is the ultimate physical law: it is a mathematical rule of making predictions that generates all the other rules we can come up with. Such a point of view, although in some sense justified, at present would be impractical: this is because we know how to compute using actual physical models (including running computer simulations), but not so much using models of rationality. But this is just another way of saying we haven’t constructed AGI yet.
    I don’t think it’s meaningful to say that “weird physics may enable super Turing computation.” Hypercomputation is just a mathematical abstraction. We can imagine, to a point, that we live in a universe which contains hypercomputers, but since our own brain is not a hypercomputer, we can never fully test such a theory. This IMO is the most fundumental significance of the Church-Turing thesis: since we only perceive the world through the lens of our own mind, then from our subjective point of view, the world only contains computable processes.
    - cousin_it 1 Oct 2018 21:29 UTC
      LW: 6 AF: 3
      0
      AF Parent
      If your mind was computable but the external world had lots of seeming hypercomputation (e.g. boxes for solving the halting problem were sold on every corner and were apparently infallible), would you prefer to build an AI that used a prior over hypercomputable worlds, or an AI that used Solomonoff induction because it’s the ultimate physical law?
      - Vanessa Kosoy 1 Oct 2018 21:48 UTC
        LW: 9 AF: 5
        0
        AF Parent
        What does it mean to have a box for solving the halting problem? How do you know it really solves the halting problem? There are some computable tests we can think of, but they would be incomplete, and you would only verify that the box satisfies those computable tests, not that is “really” a hypercomputer. There would be a lot of possible boxes that don’t solve the halting problem that pass the same computable tests.
        If there is some powerful computational hardware available, I would want the AI the use that hardware. If you imagine the hardware as being hypercomputers, then you can think of such an AI as having a “prior over hypercomputable worlds”. But you can alternatively think of it as reasoning using computable hypotheses about the correspondence between the output of this hardware and the output of its sensors. The latter point of view is better, I think, because you can never know the hardware is really a hypercomputer.
        What links here?
        Vanessa Kosoy's comment on Building Intuitions On Non-Empirical Arguments In Science by Scott Alexander (11 Nov 2019 11:32 UTC; 2 points)
        cousin_it 2 Oct 2018 15:37 UTC
        LW: 5 AF: 3
        0
        AF Parent
        Hmm, that approach might be ruling out not only hypercomputers, but also sufficiently powerful conventional computers (anything stronger than PSPACE maybe?) because your mind isn’t large enough to verify their strength. Is that right?
        
        Vanessa Kosoy 3 Oct 2018 9:47 UTC
        LW: 5 AF: 3
        0
        AF Parent
        In some sense, yes, although for conventional computers you might settle on very slow verification. Unless you mean that, your mind has only finite memory/lifespan and therefore you cannot verify an arbitrary conventional computer within any given credence, which is also true. Under favorable conditions, you can quickly verify something in PSPACE (using interactive proof protocols), and given extra assumptions you might be able to do better (if you have two provers that cannot communicate you can do NEXP, or if you have a computer whose memory you can reliably delete you can do an EXP-complete language), however it is not clear whether you can be justifiably highly certain of such extra assumptions.
        
        See also my reply to lbThingrb.
        
        lbThingrb 3 Oct 2018 1:43 UTC
        2 points
        0
        AF Parent
        This can’t be right … Turing machines are assumed to be able to operate for unbounded time, using unbounded memory, without breaking down or making errors. Even finite automata can have any number of states and operate on inputs of unbounded size. By your logic, human minds shouldn’t be modeling physical systems using such automata, since they exceed the capabilities of our brains.
        It’s not that hard to imagine hypothetical experimental evidence that would make it reasonable to believe that hypercomputers could exist. For example, suppose someone demonstrated a physical system that somehow emulated a universal Turing machine with infinite tape, using only finite matter and energy, and that this system could somehow run the emulation at an accelerating rate, such that it computed n steps in $\sum_{k = 1}^{n} \frac{1}{2^{k}}$ seconds. (Let’s just say that it resets to its initial state in a poof of pixie dust if the TM doesn’t halt after one second.)
        You could try to reproduce this experiment and test it on various programs whose long-term behavior is predictable, but you could only test it on a finite (to say nothing of computable) set of such inputs. Still, if no one could come up with a test that stumped it, it would be reasonable to conclude that it worked as advertised. (Of course, some alternative explanation would be more plausible at first, given that the device as described would contradict well established physical principles, but eventually the weight of evidence would compel one to rewrite physics instead.)
        One could hypothesize that the device only behaved as advertised on inputs for which human brains have the resources to verify the correctness of its answers, but did something else on other inputs, but you could just as well say that about a normal computer. There’d be no reason to believe such an alternative model, unless it was somehow more parsimonious. I don’t know any reason to think that theories that don’t posit uncomputable behavior can always be found which are at least as simple as a given theory that does.
        Having said all that, I’m not sure any of it supports either side of the argument over whether there’s an ideal mathematical model of general intelligence, or whether there’s some sense in which intelligence is more fundamental than physics. I will say that I don’t think the Church-Turing thesis is some sort of metaphysical necessity baked into the concept of rationality. I’d characterize it as an empirical claim about (1) human intuition about what constitutes an algorithm, and (2) contingent limitations imposed on machines by the laws of physics.
        Vanessa Kosoy 3 Oct 2018 9:35 UTC
        LW: 8 AF: 4
        0
        AF Parent
        It is true that a human brain is more precisely described as a finite automaton than a Turing machine. And if we take finite lifespan into account, then it’s not even a finite automaton. However, these abstractions are useful models since they become accurate in certain asymptotic limits that are sufficiently useful to describe reality. On the other hand, I doubt that there is a useful approximation in which the brain is a hypercomputer (except maybe some weak forms of hypercomputation like non-uniform computation / circuit complexity).
        
        Moreover, one should distinguish between different senses in which we can be “modeling” something. The first sense is the core, unconscious ability of the brain to generate models, and in particular that which we experience as intuition. This ability can (IMO) be thought of as some kind of machine learning algorithm, and, I doubt that hypercomputation is relevant there in any way. The second sense is the “modeling” we do by manipulating linguistic (symbolic) constructs in our conscious mind. These constructs might be formulas in some mathematical theory, including formulas that represent claims about uncomputable objects. However, these symbolic manipulations are just another computable process, and it is only the results of these manipulations that we use to generate predictions and/or test models, since this is the only access we have to those uncomputable objects.
        
        Regarding your hypothetical device, I wonder how would you tell whether it is the hypercomputer you imagine it to be, versus the realization of the same hypercomputer in some non-standard model of ZFC? (In particular, the latter could tell you that some Turing machine halts when it “really” doesn’t, because in the model it halts after some non-standard number of computing steps.) More generally, given an uncomputable function $h$ and a system under test $f$ , there is no sequence of computable tests that will allow you to form some credence about the hypothesis $f = h$ s.t. this credence will converge to $1$ when the hypothesis is true and $0$ when the hypothesis is false. (This can be made an actual theorem.) This is different from the situation with normal computers (i.e. computable $h$ ) when you can devise such a sequence of tests. (Although you can in principle have a class of uncomputable hypotheses s.t. you can asymptotically verify $f$ is in the class, for example the class of all functions $h$ s.t. it is consistent with ZFC that $h$ is the halting function. But the verification would be extremely slow and relatively parsimonious competing hypotheses would remain plausible for an extremely (uncomputably) long time. In any case, notice that the class itself has, in some strong sense, a computable description: specifically, the computable verification procedure itself.)
        
        My point is, the Church-Turing thesis implies (IMO) that the mathematical model of rationality/intelligence should be based on Turing machines at most, and this observation does not strongly depend on assumptions about physics. (Well, if hypercomputation is physically possible, and realized in the brain, and there is some intuitive part of our mind that uses hypercomputation in a crucial way, then this assertion would be wrong. That would contradict my own intuition about what reasoning is (including intuitive reasoning), besides everything we know about physics, but obviously this hypothesis has some positive probability.)
        
        What links here?
        Vanessa Kosoy's comment on Realism about rationality by Richard_Ngo (3 Oct 2018 9:47 UTC; 5 points)
        Vanessa Kosoy's comment on Realism about rationality by Richard_Ngo (11 Nov 2019 12:15 UTC; 2 points)
        lbThingrb 4 Oct 2018 21:12 UTC
        LW: 3 AF: 2
        0
        AF Parent
        I didn’t mean to suggest that the possibility of hypercomputers should be taken seriously as a physical hypothesis, or at least, any more seriously than time machines, perpetual motion machines, faster-than-light, etc. And I think it’s similarly irrelevant to the study of intelligence, machine or human. But in my thought experiment, the way I imagined it working was that, whenever the device’s universal-Turing-machine emulator halted, you could then examine its internal state as thoroughly as you liked, to make sure everything was consistent with the hypothesis that it worked as specified (and the non-halting case could be ascertained by the presence of pixie dust 🙂). But since its memory contents upon halting could be arbitrarily large, in practice you wouldn’t be able to examine it fully even for individual computations of sufficient complexity. Still, if you did enough consistency checks on enough different kinds of computations, and the cleverest scientists couldn’t come up with a test that the machine didn’t pass, I think believing that the machine was a true halting-problem oracle would be empirically justified.
        It’s true that a black box oracle could output a nonstandard “counterfeit” halting function which claimed that some actually non-halting TMs do halt, only for TMs that can’t be proved to halt within ZFC or any other plausible axiomatic foundation humans ever come up with, in which case we would never know that it was lying to us. It would be trickier for the device I described to pull off such a deception, because it would have to actually halt and show us its output in such cases. For example, if it claimed that some actually non-halting TM M halted, we could feed it a program that emulated M and output the number of steps M took to halt. That program would also have to halt, and output some specific number n. In principle, we could then try emulating M for n steps on a regular computer, observe that M hadn’t reached a halting state, and conclude that the device was lying to us. If n were large enough, that wouldn’t be feasible, but it’s a decisive test that a normal computer could execute in principle. I suppose my magical device could instead do something like leave an infinite output string in memory, that a normal computer would never know was infinite, because it could only ever examine finitely much of it. But finite resource bounds already prevent us from completely ruling out far-fetched hypotheses about even normal computers. We’ll never be able to test, e.g., an arbitrary-precision integer comparison function on all inputs that could feasibly be written down. Can we be sure it always returns a Boolean value, and never returns the Warner Brothers dancing frog?
        Actually, hypothesizing that my device “computed” a nonstandard version of the halting function would already be sort of self-defeating from a standpoint of skepticism about hypercomputation, because all nonstandard models of Peano arithmetic are known to be uncomputable. A better skeptical hypothesis would be that the device passed off some actually halting TMs as non-halting, but only in cases where the shortest proof that any of those TMs would have halted eventually was too long for humans to have discovered yet. I don’t know enough about Solomonoff induction to say whether it would unduly privilege such hypotheses over the hypothesis that the device was a true hypercomputer (if it could even entertain such a hypothesis). Intuitively, though, it seems to me that, if you went long enough without finding proof that the device wasn’t a true hypercomputer, continuing to insist that such proof would be found at some future time would start to sound like a God-of-the-gaps argument. I think this reasoning is valid even in a hypothetical universe in which human brains couldn’t do anything Turing machines can’t do, but other physical systems could. I admit that’s a nontrivial, contestable conclusion. I’m just going on intuition here.
        Vanessa Kosoy 5 Oct 2018 9:31 UTC
        LW: 3 AF: 3
        0
        AF Parent
        Nearly everything you said here was already addressed in my previous comment. Perhaps I didn’t explain myself clearly?
        
        It would be trickier for the device I described to pull off such a deception, because it would have to actually halt and show us its output in such cases.
        
        I wrote before that “I wonder how would you tell whether it is the hypercomputer you imagine it to be, versus the realization of the same hypercomputer in some non-standard model of ZFC?”
        
        So, the realization of a particular hypercomputer in a non-standard model of ZFC would pass all of your tests. You could examine its internal state or its output any way you like (i.e. ask any question that can be formulated in the language of ZFC) and everything you see would be consistent with ZFC. The number of steps for a machine that shouldn’t halt would be a non-standard number, so it would not fit on any finite storage. You could examine some finite subset of its digits (either from the end or from the beginning), for example, but that would not tell you the number is non-standard. For any question of the form “is $n$ larger than some known number $n_{0}$ ?” the answer would always be “yes”.
        
        But finite resource bounds already prevent us from completely ruling out far-fetched hypotheses about even normal computers. We’ll never be able to test, e.g., an arbitrary-precision integer comparison function on all inputs that could feasibly be written down. Can we be sure it always returns a Boolean value, and never returns the Warner Brothers dancing frog?
        
        Once again, there is a difference of principle. I wrote before that: ”...given an uncomputable function $h$ and a system under test $f$ , there is no sequence of computable tests that will allow you to form some credence about the hypothesis $f = h$ s.t. this credence will converge to $1$ when the hypothesis is true and $0$ when the hypothesis is false. (This can be made an actual theorem.) This is different from the situation with normal computers (i.e. computable $h$ ) when you can devise such a sequence of tests.”
        
        So, with normal computers you can become increasingly certain your hypothesis regarding the computer is true (even if you never become literally 100% certain, except in the limit), whereas with a hypercomputer you cannot.
        
        Actually, hypothesizing that my device “computed” a nonstandard version of the halting function would already be sort of self-defeating from a standpoint of skepticism about hypercomputation, because all nonstandard models of Peano arithmetic are known to be uncomputable.
        
        Yes, I already wrote that: “Although you can in principle have a class of uncomputable hypotheses s.t. you can asymptotically verify $f$ is in the class, for example the class of all functions $h$ s.t. it is consistent with ZFC that $h$ is the halting function. But the verification would be extremely slow and relatively parsimonious competing hypotheses would remain plausible for an extremely (uncomputably) long time. In any case, notice that the class itself has, in some strong sense, a computable description: specifically, the computable verification procedure itself.”
        
        So, yes, you could theoretically become certain the device is a hypercomputer (although reaching high certainly would take very long time), without knowing precisely which hypercomputer it is, but that doesn’t mean you need to add non-computable hypotheses to your “prior”, since that knowledge would still be expressible as a computable property of the world.
        
        I don’t know enough about Solomonoff induction to say whether it would unduly privilege such hypotheses over the hypothesis that the device was a true hypercomputer (if it could even entertain such a hypothesis).
        
        Literal Solomonoff induction (or even bounded versions of Solomonoff induction) is probably not the ultimate “true” model of induction, I was just using it as a simple example before. The true model will allow expressing hypotheses such as “all the even-numbered bits in the sequence are $1$ ”, which involve computable properties of the environment that do not specify it completely. Making this idea precise is somewhat technical.
        
        What links here?
        Vanessa Kosoy's comment on On the falsifiability of hypercomputation by jessicata (7 Feb 2020 9:32 UTC; 34 points)
    - TAG 11 Nov 2019 11:54 UTC
      1 point
      0
      Parent
      
      Physics is not the territory, physics is (quite explicitly) the models we have of the territory.
      
      People tend to use the word physics in both the map and the territory sense.
      
      We can imagine, to a point, that we live in a universe which contains hypercomputers, but since our own brain is not a hypercomputer, we can never fully test such a theory.
      
      That would follow if testing a theory consisted of solely running a simulation in your head, but that is not how physics, the science, works. If the universe was hypercomputational, that would manifest as failures of computatable physics. Note that you only need to run computable physics to generate predictions that are then falsified.
      
      This IMO is the most fundumental significance of the Church-Turing thesis: since we only perceive the world through the lens of our own mind, then from our subjective point of view, the world only contains computable processes.
      
      If true, that is a form of neo-Kantian idealism. Is that what you really wanted to say?
      - Vanessa Kosoy 11 Nov 2019 12:15 UTC
        2 points
        0
        Parent
        
        If the universe was hypercomputational, that would manifest as failures of computable physics.
        
        Well, it would manifest as a failure to create a complete and deterministic theory of computable physics. If your physics doesn’t describe absolutely everything, hypercomputation can hide in places it doesn’t describe. If your physics is stochastic (like quantum mechanics for example) then the random bits can secretly follow a hypercomputable pattern. Sort of “hypercomputer of the gaps”. Like I wrote before, there actually can be situations in which we gradually become confident that something is a hypercomputer (although certainty would grow very slowly), but we will never know precisely what kind of hypercomputer it is.
        
        If true, that is a form of neo-Kantian idealism. Is that what you really wanted to say?
        
        Unfortunately I am not sufficiently versed in philosophy to say. I do not make any strong claims to novelty or originality.
- Chris Hibbert 22 Sep 2018 16:50 UTC
  1 point
  0
  Parent
  The question is, given a situation in which intuition A demands action X and intuition B demands action Y, what is the morally correct action? The answer might be “X”, it might be “Y”, it might be “both actions are equally good”, or it might be even “Z” for some Z different from both X and Y. But any answer effectively determines a way to remove the contradiction, replacing it by a consistent overarching system. And, if we actually face that situation, we need to actually choose an answer.
  This reminds me of my rephrasing of the description of epistemology. The standard description started out as “the science of knowledge” or colloquially, “how do we know what we know”. I’ve maintained, since reading Bartley (“The Retreat to Commitment”), that the right description is “How do we decide what to believe?” So your final sentence seems right to me, but that’s different from the rest of your argument, which presumes that there’s a “right” answer and our job is finding it. Our job is finding a decision procedure, and studying what differentiates “right” answers from “wrong” answers is useful fodder for that, but it’s not the actual goal.
- Richard_Ngo 18 Sep 2018 23:42 UTC
  LW: 1 AF: 2
  0
  AF Parent
  “Fitness” just refers to the expected behavior of the number of descendants of a given organism or gene. Therefore, it is perfectly definable modulo the concept of a “descendant”. The latter is not as unambiguously defined as “momentum” but under normal conditions it is quite precise.
  Similarly, you can define intelligence as expected performance on a broad suite of tasks. However, what I was trying to get at with “define its fitness in terms of more basic traits” is being able to build a model of how it can or should actually work, not just specify measurement criteria.
  I wonder whether the OP also doesn’t count all of computational learning theory? Also, physics is definitely not a sufficient description of biology but on the other hand, physics is still very useful for understanding biology.
  I do consider computational learning theory to be evidence for rationality realism. However, I think it’s an open question whether CLT will turn out to be particularly useful as we build smarter and smarter agents—to my knowledge it hasn’t played an important role in the success of deep learning, for instance. It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.
  However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence. This is because the latter theory aims to describe mindspace as a whole rather than describing a particular rather arbitrary point inside it.
  ...
  Now, “rationality” and “intelligence” are in some sense even more fundamental than physics… Therefore, it seems like there has to be a simple theory of intelligence.
  This feels more like a restatement of our disagreement than an argument. I do feel some of the force of this intuition, but I can also picture a world in which it’s not the case. Note that most of the reasoning humans do is not math-like, but rather a sort of intuitive inference where we draw links between different vague concepts and recognise useful patterns—something we’re nowhere near able to formalise. I plan to write a follow-up post which describes my reasons for being skeptical about rationality realism in more detail.
  We only need to determine “human goals” within the precision to which they are actually well-defined, not within absolute precision.
  I agree, but it’s plausible that they are much less well-defined than they seem. The more we learn about neuroscience, the more the illusion of a unified self with coherent desires breaks down. There may be questions which we all agree are very morally important, but where most of us have ill-defined preferences such that our responses depend on the framing of the problem (e.g. the repugnant conclusion).
  - Vanessa Kosoy 19 Sep 2018 12:27 UTC
    LW: 17 AF: 6
    0
    AF Parent
    
    ...what I was trying to get at with “define its fitness in terms of more basic traits” is being able to build a model of how it can or should actually work, not just specify measurement criteria.
    
    Once again, it seems perfectly possible to build an abstract theory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!
    
    I do consider computational learning theory to be evidence for rationality realism. However, I think it’s an open question whether CLT will turn out to be particularly useful as we build smarter and smarter agents—to my knowledge it hasn’t played an important role in the success of deep learning, for instance.
    
    It plays a minor role in deep learning, in the sense that some “deep” algorithms are adaptations of algorithms that have theoretical guarantees. For example, deep Q-learning is an adaptation of ordinary Q-learning. Obviously I cannot prove that it is possible to create an abstract theory of intelligence without actually creating the theory. However, the same could be said about any endeavor in history.
    
    It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.
    
    Mathematical models of evolution might help you to build better evolutions. In order to build better birds, you would need mathematical models of birds, which are going to be much more messy.
    
    This feels more like a restatement of our disagreement than an argument. I do feel some of the force of this intuition, but I can also picture a world in which it’s not the case.
    
    I don’t think it’s a mere restatement? I am trying to show that “rationality realism” is what you should expect based on Occam’s razor, which is a fundamental principle of reason. Possibly I just don’t understand your position. In particular, I don’t know what epistemology is like in the world you imagine. Maybe it’s a subject for your next essay.
    
    Note that most of the reasoning humans do is not math-like, but rather a sort of intuitive inference where we draw links between different vague concepts and recognise useful patterns
    
    This seems to be confusing between objects and representations of objects. The assumption there is some mathematical theory at the core of human reasoning does not mean that a description of this mathematically theory should automatically exist in the conscious, symbol-manipulating part of the mind. You can have a reinforcement learning algorithm that is perfectly well-understood mathematically, and yet nowhere inside the state of the algorithm is a description of the algorithm itself or the mathematics behind it.
    
    There may be questions which we all agree are very morally important, but where most of us have ill-defined preferences such that our responses depend on the framing of the problem (e.g. the repugnant conclusion).
    
    The response might depend on the framing if you’re asked a question and given 10 seconds to answer it. If you’re allowed to deliberate on the question, and in particular consider alternative framings, the answer becomes more well-defined. However, even if it is ill-defined, it doesn’t really change anything. We can still ask the question “given the ability to optimize any utility function over the world now, what utility function should we choose?” Perhaps it means that we need consider our answers to ethical questions provided a randomly generated framing. Or maybe it means something else. But in any case, it is a question that can and should be answered.
    - Richard_Ngo 19 Sep 2018 20:08 UTC
      LW: -1 AF: 2
      0
      AF Parent
      It seems perfectly possible to build an abstract theory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!
      ...
      It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.
      It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like “species will evolve faster when there are predators in their environment” and “species which use sexual reproduction will be able to adapt faster to novel environments”. The analogous abstract theory of intelligence can tell us things like “agents will be less able to achieve their goals when they are opposed by other agents” and “agents with more compute will perform better in novel environments”. These sorts of conclusions are not very useful for safety.
      I don’t think it’s a mere restatement? I am trying to show that “rationality realism” is what you should expect based on Occam’s razor, which is a fundamental principle of reason.
      Sorry, my response was a little lazy, but at the same time I’m finding it very difficult to figure out how to phrase a counterargument beyond simply saying that although intelligence does allow us to understand physics, it doesn’t seem to me that this implies it’s simple or fundamental. Maybe one relevant analogy: maths allows us to analyse tic-tac-toe, but maths is much more complex than tic-tac-toe. I understand that this is probably an unsatisfactory intuition from your perspective, but unfortunately don’t have time to think too much more about this now; will cover it in a follow-up.
      You can have a reinforcement learning algorithm that is perfectly well-understood mathematically, and yet nowhere inside the state of the algorithm is a description of the algorithm itself or the mathematics behind it.
      Agreed. But the fact that the main component of human reasoning is something which we have no idea how to formalise is some evidence against the possibility of formalisation—evidence which might be underweighted if people think of maths proofs as a representative example of reasoning.
      We can still ask the question “given the ability to optimize any utility function over the world now, what utility function should we choose?” Perhaps it means that we need consider our answers to ethical questions provided a randomly generated framing. Or maybe it means something else. But in any case, it is a question that can and should be answered.
      I’m going to cop out of answering this as well, on the grounds that I have yet another post in the works which deals with it more directly. One relevant claim, though: that extreme optimisation is fundamentally alien to the human psyche, and I’m not sure there’s any possible utility function which we’d actually be satisfied with maximising.
      - Vanessa Kosoy 20 Sep 2018 11:13 UTC
        LW: 21 AF: 7
        0
        AF Parent
        
        It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like “species will evolve faster when there are predators in their environment” and “species which use sexual reproduction will be able to adapt faster to novel environments”. The analogous abstract theory of intelligence can tell us things like “agents will be less able to achieve their goals when they are opposed by other agents” and “agents with more compute will perform better in novel environments”. These sorts of conclusions are not very useful for safety.
        
        As a matter of fact, I emphatically do not agree. “Birds” are a confusing example, because it speaks of modifying an existing (messy, complicated, poorly designed) system rather than making something from scratch. If we wanted to make something vaguely bird-like from scratch, we might have needed something like a “theory of self-sustaining, self-replicating machines”.
        
        Let’s consider a clearer example: cars. In order to build a car, it is very useful to have a theory of mechanics, chemistry, thermodynamic etc. Just doings things by trial and error would be much less effective, especially if you don’t want the car to occasionally explode (given that the frequency of explosions might be too low to affordably detect during testing). This is not because a car is “simple”: a spaceship or, let’s say, a gravity wave detector is much more complex than a car, and yet you hardly need less theory to make one.
        
        And another example: cryptography. In fact, cryptography is not so far from AI safety: in the former case, you defend against an external adversary whereas in the latter you defend against perverse incentives and subagents inside the AI. If we had this conversation in the 1960s (say), you might have said that cryptography is obviously a complex, messy domain, and theorizing about it is next to useless, or at least not helpful for designing actual encryption systems (there was Shannon’s work, but since it ignored computational complexity you can maybe compare it to algorithmic information theory and statistical learning theory for AI today; if we had this conversation in the 1930s, then there would next to no theory at all, even though encryption was practiced since ancient times). And yet, today theory plays an essential role in this field. The domain actually is very hard: most of the theory relies on complexity theoretic conjectures that we are still far from being able to prove (although I expect that most theoretical computer scientists would agree that eventually we will solve them). However, even without being able to formally prove everything, the ability to reduce the safety of many different protocols to a limited number of interconnected conjectures (some of which have an abundance of both theoretical and empirical evidence) allows us to immensely increase our confidence in those protocols.
        
        Similarly, I expect an abstract theory of intelligence to be immensely useful for AI safety. Even just having precise language to define what “AI safety” means would be very helpful, especially to avoid counter-intuitive failure modes like the malign prior. At the very least, we could have provably safe but impractical machine learning protocols that would be an inspiration to more complex algorithms about which we cannot prove things directly (like in deep learning today). More optimistically (but still realistically IMO) we could have practical algorithms satisfying theoretical guarantees modulo a small number of well-studied conjectures, like in cryptography today. This way, theoretical and empirical research could feed into each other, the whole significantly surpassing the sum of its parts.
Said Achmiz 16 Sep 2018 16:44 UTC
27 points
0
Excellent post!

I find myself agreeing with much of what you say, but there are a couple of things which strike me as… not quite fitting (at least, into the way I have thought about these issues), and also I am somewhat skeptical about whether your attempt at conceptually unifying these concerns—i.e., the concept of “rationality realism”—quite works. (My position on this topic is rather tentative, I should note; all that’s clear to me is that there’s much here that’s confusing—which is, however, itself a point of agreement with the OP, and disagreement with “rationality realists”, who seem much more certain of their view than the facts warrant.)

Some specific points:

… suppose that you just were your system 1, and that your system 2 was mostly a Hansonian rationalisation engine on top (one which occasionally also does useful maths)

This seems to me to be a fundamentally confused proposition. Regardless of whether Hanson is right about how our minds work (and I suspect he is right to a large degree, if not quite entirely right), the question of who we are seems to be a matter of choosing which aspect(s) of our minds’ functioning to endorse as ego-syntonic. Under this view, it is nonsensical to speak of a scenario where it “turns out” that I “am just my system 1”.

O: […] Maybe you can ignore the fact that your preferences contain a contradiction, but if we scaled you up to be much more intelligent, running on a brain orders of magnitude larger, having such a contradiction would break your thought processes.

Your quoted reply to this is good, but I just want to note that it’s almost not even necessary. The simpler reply of “you have literally no way of knowing that, and what you just said is completely 100% wild speculation, about a scenario that you don’t even know is possible” would also be sufficient.

(Also, what on earth does “break your thought process” even mean? And what good is being “much more intelligent” if something can “break your thought process” that would leave a “less intelligent” mind unharmed? Etc., etc.)

(That’s all for now, though I may have more to say about this later. For now, I’ll only say again that it’s a startlingly good crystallization of surprisingly many disagreements I’ve had with people on and around Less Wrong, and I’m excited to see this approach to the topic explored further.)
- Richard_Ngo 16 Sep 2018 21:01 UTC
  16 points
  0
  Parent
  Thanks for the helpful comment! I’m glad other people have a sense of the thing I’m describing. Some responses:
  I am somewhat skeptical about whether your attempt at conceptually unifying these concerns—i.e., the concept of “rationality realism”—quite works.
  I agree that it’s a bit of a messy concept. I do suspect, though, that people who see each of the ideas listed above as “natural” do so because of intuitions that are similar both across ideas and across people. So even if I can’t conceptually unify those intuitions, I can still identify a clustering.
  Regardless of whether Hanson is right about how our minds work (and I suspect he is right to a large degree, if not quite entirely right), the question of who we are seems to be a matter of choosing which aspect(s) of our minds’ functioning to endorse as ego-syntonic. Under this view, it is nonsensical to speak of a scenario where it “turns out” that I “am just my system 1”.
  I was a bit lazy in expressing it, but I think that the underlying idea makes sense (and have edited to clarify a little). There are certain properties we consider key to our identities, like consistency and introspective access. If we find out that system 2 has much less of those than we thought, then that should make us shift towards identifying more with our system 1s. Also, the idea of choosing which aspects to endorse presupposes some sort of identification with the part of your mind that makes the choice. But I could imagine finding out that this part of my brain is basically just driven by signalling, and then it wouldn’t even endorse itself. That also seems like a reason to default to identifying more with your system 1.
  Also, what on earth does “break your thought process” even mean?
  An analogy: in maths, a single contradiction “breaks the system” because it can propagate into any other proofs and lead to contradictory conclusions everywhere. In humans, it doesn’t, because we’re much more modular and selectively ignore things. So the relevant question is something like “Are much more intelligent systems necessarily also more math-like, in that they can’t function well without being internally consistent?”
  - Said Achmiz 16 Sep 2018 21:25 UTC
    15 points
    0
    Parent
    
    I agree that it’s a bit of a messy concept. I do suspect, though, that people who see each of the ideas listed above as “natural” do so because of intuitions that are similar both across ideas and across people. So even if I can’t conceptually unify those intuitions, I can still identify a clustering.
    
    For the record, and in case I didn’t get this across—I very much agree that identifying this clustering is quite valuable.
    
    As for the challenge of conceptual unification, we ought, I think, to treat it as a separate and additional challenge (and, indeed, we must be open to the possibility that a straightforward unification is not, after all, appropriate).
    
    I was a bit lazy in expressing it, but I think that the underlying idea makes sense (and have edited to clarify a little). There are certain properties we consider key to our identities, like consistency and introspective access. If we find out that system 2 has much less of those than we thought, then that should make us shift towards identifying more with our system 1s. Also, the idea of choosing which aspects to endorse presupposes some sort of identification with the part of your mind that makes the choice. But I could imagine finding out that this part of my brain is basically just driven by signalling, and then it wouldn’t even endorse itself. That also seems like a reason to default to identifying more with your system 1.
    
    I don’t want to go too far down this tangent, as it is not really critical to your main point, but I actually don’t agree with the claim “the idea of choosing which aspects to endorse presupposes some sort of identification with the part of your mind that makes the choice”; that is why I was careful to speak of endorsing aspects of our minds’ functioning, rather than identifying with parts of ourselves. I’ve spoken of, elsewhere, of my skepticism toward the notion of conceptually dividing one’s own mind, and then selecting one of the sections to identify with. But this is a complex topic, and deserves dedicated treatment; best to set it aside for now, I think.
    
    So the relevant question is something like “Are much more intelligent systems necessarily also more math-like, in that they can’t function well without being internally consistent?”
    
    I think that this formulation makes sense.
    
    To me, then, it suggests some obvious follow-up questions, which I touched upon in my earlier reply:
    
    In what sense, exactly, are these purportedly “more intelligent” systems actually “more intelligent”, if they lack the flexibility and robustness of being able to hold contradictions in one’s mind? Or is this merely a flaw in human mental architecture? Might it, rather, be the case that these “more intelligent” systems are simply better than human-like minds at accomplishing their goals, in virtue of their intolerance for inconsistency? But it is not clear how such a claim survives the observation that humans are often inconsistent in what our goals are; it is not quite clear what it means to better accomplish inconsistent goals by being more consistent…
    
    To put it another way, there seems to be some manner of sleight of hand (perhaps an unconscious one) being performed with the concept of “intelligence”. I can’t quite put my finger on the nature of the trick, but something, clearly, is up.
cousin_it 20 Sep 2018 9:42 UTC
18 points
0

The idea that there is an “ideal” decision theory.

There are many classes of decision problems that allow optimal solutions, but none of them can cover all of reality, because in reality an AI can be punished for having any given decision theory. That said, the design space of decision theories has sweet spots. For example, future AIs will likely face an environment where copying and simulation is commonplace, and we’ve found simple decision theories that allow for copies and simulations. Looking for more such sweet spots is fun and fruitful.
- Richard_Ngo 20 Sep 2018 18:17 UTC
  1 point
  0
  Parent
  Imo we haven’t found a simple decision theory that allows for copies and simulations. We’ve found a simple rule that works in limiting cases, but is only well-defined for identical copies (modulo stochasticity). My expectation that FDT will be rigorously extended from this setting is low, for much the same reason that I don’t expect a rigorous definition of CDT. You understand FDT much better than I do, though—would you say that’s a fair summary?
  - cousin_it 20 Sep 2018 20:16 UTC
    12 points
    0
    Parent
    If all agents involved in a situation share the same utility function over outcomes, we should be able to make them coordinate despite having different source code. I think that’s where one possible boundary will settle, and I expect the resulting theory to be simple. Whereas in case of different utility functions we enter the land of game theory, where I’m pretty sure there can be no theory of unilateral decision making.
    - Richard_Ngo 20 Sep 2018 21:23 UTC
      1 point
      0
      Parent
      I’m not convinced by the distinction you draw. Suppose you simulate me at slightly less than perfect fidelity. The simulation is an agent with a (slightly) different utility function to me. Yet this seems like a case where FDT should be able to say relevant things.
      In Abram’s words,
      FDT requires a notion of logical causality, which hasn’t appeared yet.
      I expect that logical causality will be just as difficult to formalise as normal causality, and in fact that no “correct” formalisation exists for either.
      - Alexander Gietelink Oldenziel 25 Feb 2022 15:31 UTC
        1 point
        0
        Parent
        What. This seems obviously incorrect?
        The Pearl- Rubin- Sprites-Glymour- and others I theory of causality is a very powerful framework for causality that satisfies pretty what one intuitively understand as ‘causality’. It is moreover powerful enough to make definite computations and even the much- craved for ‘real applications’.
        It is ‘a very correct’ formalisation of ‘normal’ causality.
        I say ‘very correct’ instead of ‘correct’ because there are still areas of improvements—but this is more like GR improving on Newtonian gravity rather than Newtonian gravity being incorrect.
        Richard_Ngo 25 Feb 2022 15:55 UTC
        2 points
        0
        Parent
        Got a link to the best overview/defense of that claim? I’m open to this argument but have some cached thoughts about Pearl’s framework being unsatisfactory—would be useful to do some more reading and see if I still believe them.
        Alexander Gietelink Oldenziel 25 Feb 2022 16:11 UTC
        2 points
        0
        Parent
        There are some cases where Pearl and others’ causality framework can be improved—supposedly Factored Sets will, although I personally don’t understand it. I was recently informed that certain abductive counterfactual phrases due to David Lewis are not well-captured by Pearl’s system. I believe there are also other ways—all of this is actively being researched.
        What do you find unsatisfactory about Pearl?
        All of this is besides the point which is that there is a powerful well-developed, highly elegant theory of causality with an enourmous range of applications.
        Rubin’s framework (which I am told is equivalent to Pearl) is used throughout econometrics—indeed econometrics is best understand as the Science of Causality.
        I am not an expert—I am trying to learn much of this theory right now. I am probably not the best person to ask about theory of causality. That said:
        I am not sure to what degree you are already familiar with Pearl’s theory of causality but I recommend
        https://michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/
        for an excellent introduction.
        THere is EY’s
        https://www.lesswrong.com/posts/hzuSDMx7pd2uxFc5w/causal-diagrams-and-causal-models
        which you may or may not find convincing
        For a much more leisurely argument for Pearl’s viewpoint, I recommend his “book of why”. In a pinch you could take a look at the book review on the causality bloglist on LW.
        https://www.lesswrong.com/tag/causality
  - Noosphere89 12 Jan 2025 21:26 UTC
    2 points
    0
    Parent
    To be fair, I expect a lot of the cases of identical copies modulo stochasticity to exist in the future, and indeed you could argue has already happened for AI, but I expect it to be more and more relevant by default, so FDT working in the identical copies case is still a really valuable niche.
cousin_it 18 Sep 2018 9:23 UTC
14 points
0
Great post, thank you for writing this! Your list of natural-seeming ideas is very thought provoking.

The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general.

I used to think that way, but now I agree with your position more. Something like Bayesian rationality is a small piece that many problems have in common, but any given problem will have lots of other structure to be exploited as well. In many AI problems, like recognizing handwriting or playing board games, that lets you progress faster than if you’d tried to start with the Bayesian angle.

We could still hope that the best algorithm for any given problem will turn out to be simple. But that seems unlikely, judging from both AI tasks like MNIST, where neural nets beat anything hand-coded, and non-AI tasks like matrix multiplication, where asymptotically best algorithms have been getting more and more complex. As a rule, algorithms don’t get simpler as they get better.
- Vladimir_Nesov 18 Sep 2018 14:38 UTC
  16 points
  0
  Parent
  I’m not sure what you changed your mind about. Some of the examples you give are unconvincing, as they do have simple meta-algorithms that both discover the more complicated better solutions and analyse their behavior. My guess is that the point is that for example looking into nuance of things like decision theory is an endless pursuit, with more and more complicated solutions accounting for more and more unusual aspects of situations (that can no longer be judged as clearly superior), and no simple meta-algorithm that could’ve found these more complicated solutions, because it wouldn’t know what to look for. But that’s content of values, the thing you look for in human behavior, and we need at least a poor solution to the problem of making use of that. Perhaps you mean that even this poor solution is too complicated for humans to discover?
  - Richard_Ngo 18 Sep 2018 23:48 UTC
    1 point
    0
    Parent
    There’s a difference between discovering something and being able to formalise it. We use the simple meta-algorithm of gradient descent to train neural networks, but that doesn’t allow us to understand their behaviour.
    Also, meta-algorithms which seem simple to us may not in fact be simple, if our own minds are complicated to describe.
- TurnTrout 18 Sep 2018 13:01 UTC
  11 points
  0
  Parent
  My impression is that an overarching algorithm would allow the agent to develop solutions for the specialized tasks, not that it would directly constitute a perfect solution. I don’t quite understand your position here – would you mind elaborating?
  - cousin_it 19 Sep 2018 23:43 UTC
    4 points
    0
    Parent
    My position goes something like this.
    
    There are many problems to be solved. Each problem may or may not have regularities to be exploited. Some regularities are shared among many problems, like Bayes structure, but others are unique. Solving a problem in reasonable time might require exploiting multiple regularities in it, so Bayes structure alone isn’t enough. There’s no algorithm for exploiting all regularities in all problems in reasonable time (this is similar to P≠NP). You can combine algorithms for exploiting a bunch of regularities, ending up with a longer algorithm that can’t be compressed very much and doesn’t have any simple core. Human intelligence could be like that: a laundry list of algorithms that exploit specific regularities in our environment.
- romeostevensit 23 Sep 2018 2:51 UTC
  9 points
  0
  Parent
  > algorithms don’t get simpler as they get better.
  or s you minimize cost along one dimension costs get pushed into other dimensions. Aether variables apply at the level of representation too.
Wei Dai 13 Nov 2018 1:44 UTC
8 points
0

It’s a mindset which makes the following ideas seem natural

I think within “realism about rationality” there are at least 5 plausible positions one could take on other metaethical issues, some of which do not agree with all the items on your list, so it’s not really a single mindset. See this post, where I listed those 5 positions along with the denial of “realism about rationality” as the number 6 position (which I called normative anti-realism), and expressed my uncertainty as to which is the right one.
Kaj_Sotala 20 Sep 2018 8:11 UTC
8 points
0
Curated this post for:
- Having a very clear explanation of what feels like a central disagreement in many discussions, which has been implicit in many previous conversations but not explicitly laid out.
- Having lots of examples of what kinds of ideas this mindset makes seem natural.
- Generally being the kind of a post which I expect to be frequently referred back to as the canonical explanation of the thing.
Raemon 17 Sep 2018 3:56 UTC
7 points
0
Although not exactly the central point, seemed like a good time to link back to “Do you identify as the elephant or the rider?”
linkhyrule5 22 Sep 2018 18:46 UTC
6 points
0
I was kind of iffy about this post until the last point, which immediately stood out to me as something I vehemently disagree with. Whether or not humans naturally have values or are consistent is irrelevant—that which is not required will happen only at random and thus tend not to happen at all, and so if you aren’t very very careful to actually make sure you’re working in a particular coherent direction, you’re probably not working nearly as efficiently as you could be and may in fact be running in circles without noticing.
drossbucket 17 Sep 2018 5:48 UTC
6 points
0
Thanks for writing this, it’s a very concise summary of the parts of LW I’ve never been able to make sense of, and I’d love to have a better understanding of what makes the ideas in your bullet-pointed list appealing to those who tend towards ‘rationality realism’. (It’s sort of a background assumption in most LW stuff, so it’s hard to find places where it’s explicitly justified.)
Also:
What CFAR calls “purple”.
Is there any online reference explaining this?
- Richard_Ngo 19 Sep 2018 1:38 UTC
  6 points
  0
  Parent
  I had a quick look for an online reference to link to before posting this, and couldn’t find anything. It’s not a particularly complicated theory, though: “purple” ideas are vague, intuitive, pre-theoretic; “orange” ones are explicable, describable and model-able. A lot of AI safety ideas are purple, hence why CFAR tells people not just to ignore them like they would in many technical contexts.
  I’ll publish a follow-up post with arguments for and against realism about rationality.
  - TAG 25 Nov 2019 21:54 UTC
    1 point
    0
    Parent
    Or you could say vague and precise.
  - drossbucket 19 Sep 2018 5:08 UTC
    1 point
    0
    Parent
    Thanks for the explanation!
- Vaniver 25 Nov 2019 19:46 UTC
  2 points
  0
  Parent
  
  Is there any online reference explaining this?
  
  This was my attempt to explain the underlying ideas.
TruePath 19 Sep 2018 3:04 UTC
5 points
0
First, let me say I 100% agree with the idea that there is a problem in the rationality community of viewing rationality as something like momentum or gold (I named my blog rejectingrationality after this phenomena and tried to deal with it in my first post).
However, I’m not totally sure everything you say falls under that concept. In particular, I’d say that rationality realism is something like the belief that there is a fact of the matter about how best to form beliefs or take actions in response to a particular set of experiences and that many facts about this (going far beyond don’t be dutch booked). With the frequent additional belief that what is rational to do in response to various kind of experiences can be inferred by a priori considerations, e.g., think about all the ways that rule X might lead you wrong in certain possible situations so X can’t be rational.
When I’ve raised this issue in the past the response I’ve gotten from both Yudkowsky and Hanson is: “But of course we can try to be less wrong,” i.e., have less false beliefs. And of course that is true but that’s a very different notion than the notion of rationality used by rationality realists and misses the way that much of the rationality’s community’s talk about rationality isn’t about literally being less wrong but about classify rules for reaching beliefs into rational and irrational even when they don’t disagree in the actual world.
In particular, if all I’m doing is analyzing how to be less wrong I can’t criticize people who dogmatically believe things that happen to be true. After all, if god does exist, than dogmatically believing he does makes the people who do less wrong. Similarly the various critiques of human psychological dispositions as leading us to make wrong choices in some kinds of cases isn’t sufficient if those cases are rare and cases where it yields better results are common. However, those who are rationality realists suggest that there is some fact of the matter which makes these belief forming strategies irrational and thus appropriate to eschew and criticize. But, ultimately, aside from merely avoiding getting dutch booked, no rule for belief forming can assure it is less wrong than another in all possible worlds.
Kaj_Sotala 17 Sep 2018 13:56 UTC
5 points
0
I like this post and the concept in general, but would prefer slightly different terminology. To me, a mindset being called “realism about rationality” implies that this is the realistic, or correct mindset to have; a more neutral name would feel appropriate. Maybe something like “‘rationality is math’ mindset” or “‘intelligence is intelligible’ mindset”?
- Richard_Ngo 17 Sep 2018 18:40 UTC
  5 points
  0
  Parent
  Thanks for the link, I hadn’t seen that paper before and it’s very interesting.
  A mindset being called “realism about rationality” implies that this is the realistic, or correct mindset to have.
  I chose “rationality realism” as a parallel to “moral realism”, which I don’t think carries the connotations you mentioned. I do like “intelligence is intelligible” as an alternative alliteration, and I guess Anna et al. have prior naming rights. I think it would be a bit confusing to retitle my post now, but happy to use either going forward.
  - Kaj_Sotala 17 Sep 2018 20:16 UTC
    4 points
    0
    Parent
    I hadn’t seen that paper before and it’s very interesting.
    Glad you liked it!
    I chose “rationality realism” as a parallel to “moral realism”, which I don’t think carries the connotations you mentioned.
    I guess you could infer that “just as moral realism implies that objective morality is real, rationality realism implies that objective rationality is real”, but that interpretation didn’t even occur to me before reading this comment. And also importantly, “rationality realism” wasn’t the term that you used in the post; you used “realism about rationality”. “Realism about morality” would also have a different connotation than “moral realism” does.
    - Benquo 17 Sep 2018 21:06 UTC
      3 points
      0
      Parent
      I realized a few paragraphs in that this was meant to be parallel to “moral realism,” and I agree that a title of “rationality realism” would have been clearer.
avturchin 20 Sep 2018 23:22 UTC
4 points
0
Some other ideas for the list of the “rationality realism”:
- Probability actually exists, and there is a correct theory of it.
- Humans have values.
- Rationality could be presented as a short set of simple rules.
- Occam razor implies that simplest explanation is the correct one.
- Intelligence could be measured by a single scalar—IQ.
- TAG 25 Nov 2019 22:05 UTC
  2 points
  0
  Parent
  - correspondence theory is the correct theory of truth.
  - correspondence-truth is established by a combination of predictive accuracy and simplicity.
  - every AI has a utility function..
  - .. even if its utility function is in the eye if the beholder
  - modal realism is false..
  - .. but many worlds is true.
  - .. You shouldnt care about things that make no observable predictions..
  - .. unless it’s many worlds.
  - You are a piece of machinery with no free will...
  - ...but its vitally important to exert yourself to steer the world to a future without AI apocalypse.
- Richard_Ngo 21 Sep 2018 9:02 UTC
  1 point
  0
  Parent
  These ideas are definitely pointing in the direction of rationality realism. I think most of them are related to items on my list, although I’ve tried to phrase them in less ambiguous ways.
Jonathan Stray 21 Aug 2020 18:10 UTC
3 points
0
Very interesting post. I think exploring the limits of our standard models of rationality is very worthwhile. IMO the models used in AI tend to be far too abstract, and don’t engage enough with situatedness, unclear ontologies, and the fundamental weirdness of the open world.
One strand of critique of rationality that I really appreciate is David Chapman’s “meta-rationality,” which he defines as “evaluating, choosing, combining, modifying, discovering, and creating [rational] systems”
https://meaningness.com/metablog/meta-rationality-curriculum
DragonGod 30 Sep 2018 8:07 UTC
2 points
0
I consider myself a rational realist, but I don’t believe some of the things you attribute to rational realism (particularly concerning morality) and particularly concerning consciousness. I don’t think there’s a true decision theory or true morality, but I do think that you could find systems of reasoning that are provably optimal within certain formal models.

There is no sense in which our formal models are true, but as long as they have high predictive power the models would be useful, and that I think is all that matters.
Sammy Martin 18 Jul 2020 18:04 UTC
1 point
0
“Implicit in this metaphor is the localization of personal identity primarily in the system 2 rider. Imagine reversing that, so that the experience and behaviour you identify with are primarily driven by your system 1, with a system 2 that is mostly a Hansonian rationalization engine on top (one which occasionally also does useful maths). Does this shift your intuitions about the ideas above, e.g. by making your CEV feel less well-defined?”
I find this very interesting because locating personal identity in system 1 feels conceptually impossible or deeply confusing. No matter how much rationalization goes on, it never seems intuitive to identify myself with system 1. How can you identify with the part of yourself that isn’t doing the explicit thinking, including the decision about which part of yourself to identify with? It reminds me of Nagel’s The Last Word: This doesn’t feel like an empirical question to me.
Perhaps this just means that I have a very deep ‘realism about rationality’ assumption. I also think that the existing philosophy literature on realism about practical reasons is relevant here. I think realism about rationality and about ‘practical reasons’ are the same thing.
- Sammy Martin 22 Jul 2020 18:14 UTC
  7 points
  0
  Parent
  If this ‘realism about rationality’ really is rather like “realism about epistemic reasons/‘epistemic facts’”, then you have the ‘normative web argument’ to contend with—if you are a moral antirealist. Convergence and ‘Dutch book’ type arguments often appear in more recent metaethics, and the similarity has been noted, leading to arguments such as these:
  These and other points of analogy between the moral and epistemic domains might well invite the suspicion that the respective prospects of realism and anti-realism in the two domains are not mutually independent, that what is most plausibly true of the one is likewise most plausibly true of the other. This suspicion is developed in Cuneo’s “core argument” which runs as follows (p. 6):
  (1) If moral facts do not exist, then epistemic facts do not exist.
  (2) Epistemic facts exist.
  (3) So moral facts exist.
  (4) If moral facts exist, then moral realism is true.
  (5) So moral realism is true.
  These considerations seem to clearly indicate ‘realism about epistemic facts’ in the metaethical sense:
  - The idea that there is an “ideal” decision theory.
  - The idea that, given certain evidence for a proposition, there’s an “objective” level of subjective credence which you should assign to it, even under computational constraints.
  - The idea that having having contradictory preferences or beliefs is really bad, even when there’s no clear way that they’ll lead to bad consequences (and you’re very good at avoiding dutch books and money pumps and so on).
  This seems to directly concede or imply the ‘normative web’ Argument, or to imply some form of normative (if not exactly moral) realism:
  - The idea that morality is quite like mathematics, in that there are certain types of moral reasoning that are just correct.
  - The idea that defining coherent extrapolated volition in terms of an idealised process of reflection roughly makes sense, and that it converges in a way which doesn’t depend very much on morally arbitrary factors.
  If ‘realism about rationality’ is really just normative realism in general, or realism about epistemic facts, then there is already an extensive literature on whether it is right or not. The links above are just the obvious starting points that came to my mind.