Although I don’t necessarily subscribe to the precise set of claims characterized as “realism about rationality”, I do think this broad mindset is mostly correct, and the objections outlined in this essay are mostly wrong.

There’s a key difference between the first two, though. Momentum is very amenable to formalisation: we can describe it using precise equations, and even prove things about it. Evolutionary fitness is the opposite: although nothing in biology makes sense without it, no biologist can take an organism and write down a simple equation to define its fitness in terms of more basic traits. This isn’t just because biologists haven’t figured out that equation yet. Rather, we have excellent reasons to think that fitness is an incredibly complicated “function” which basically requires you to describe that organism’s entire phenotype, genotype and environment.

This seems entirely wrong to me. Evolution definitely should be studied using mathematical models, and although I am not an expert in that, AFAIK this approach is fairly standard. “Fitness” just refers to the expected behavior of the number of descendants of a given organism or gene. Therefore, it is perfectly definable modulo the concept of a “descendant”. The latter is not as unambiguously defined as “momentum” but under normal conditions it is quite precise. The actual structure and dynamics of biological organisms and their environment is very complicated, but this does not preclude the abstract study of evolution, i.e. understanding which sort of dynamics are possible in principle (for general environments) and in which way they depend on the environment etc. Applying this knowledge to real-life evolution is not trivial (and it does require a lot of complementary empirical research), as is the application of theoretical knowledge in any domain to “messy” real-life examples, but that doesn’t mean such knowledge is useless. On the contrary, such knowledge is often essential to progress.

In a nutshell, then, realism about rationality is a mindset in which reasoning and intelligence are more like momentum than like fitness. It’s a mindset which makes the following ideas seem natural: The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general. (I don’t count brute force approaches like AIXI for the same reason I don’t consider physics a simple yet powerful description of biology)...

I wonder whether the OP also doesn’t count all of computational learning theory? Also, physics is definitely not a sufficient description of biology but on the other hand, physics is still very useful for understanding biology. Indeed, it’s hard to imagine we would achieve the modern level of understanding chemistry without understanding at least non-relativistic quantum mechanics, and it’s hard to imagine we would make much headway in molecular biology without chemistry, thermodynamics et cetera.

This essay is primarily intended to explain my position, not justify it, but one important consideration for me is that intelligence as implemented in humans and animals is very messy, and so are our concepts and inferences, and so is the closest replica we have so far (intelligence in neural networks).

Once again, the OP uses the concept of “messiness” in a rather ambiguous way. It is true that human and animal intelligence is “messy” in the sense that brains are complex and many of the fine details of their behavior are artifacts of either fine details in limitations of biological computational hardware, or fine details in the natural environment, or plain evolutionary accidents. However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence. This is because the latter theory aims to describe mindspace as a whole rather than describing a particular rather arbitrary point inside it.

The disagreement here seems to revolve around the question of, when should we expect to have a simple theory for a given phenomenon (i.e. when does Occam’s razor apply)? It seems clear that we should expect to have a simple theory of e.g. fundumental physics, but not a simple equation for the coastline of Africa. The difference is, physics is a unique object that has a fundumental role, whereas Africa is just one arbitrary continent among the set of all continents on all planets in the universe throughout its lifetime and all Everett branches. Therefore, we don’t expect a simple description of Africa, but we do expect a relatively simple description of planetary physics that would tell us which continent shapes are possible and which are more likely.

Now, “rationality” and “intelligence” are in some sense even more fundumental than physics. Indeed, rationality is what tells us how to form correct beliefs, i.e. how to find the correct theory of physics. Looking an anthropic paradoxes, it is even arguable that making decisions is even more fundumental than forming beliefs (since anthropic paradoxes are situations in which assigning subjective probabilities seems meaningless but the correct decision is still well-defined via “functional decision theory” or something similar). Therefore, it seems like there has to be a simple theory of intelligence, even if specific instances of intelligence are complex by virtue of their adaptation to specific computational hardware, specific utility function (or maybe some more general concept of “values”), somewhat specific (although still fairly diverse) class of environments, and also by virtue of arbitrary flaws in their design (that are still mild enough to allow for intelligent behavior).

Another way of pointing at rationality realism: suppose we model humans as internally-consistent agents with beliefs and goals. This model is obviously flawed, but also predictively powerful on the level of our everyday lives. When we use this model to extrapolate much further (e.g. imagining a much smarter agent with the same beliefs and goals), or base morality on this model (e.g. preference utilitarianism, CEV), is that more like using Newtonian physics to approximate relativity (works well, breaks down in edge cases) or more like cavemen using their physics intuitions to reason about space (a fundamentally flawed approach)?

This line of thought would benefit from more clearly delineating descriptive versus prescriptive. The question we are trying to answer is: “if we build a powerful goal-oriented agent, what goal system should we give it?” That is, it is fundamentally a prescriptive rather than descriptive question. It seems rather clear that the best choice of goal system would be in some sense similar to “human goals”. Moreover, it seems that if possibilities A and B are such that is ill-defined whether humans (or at least, those humans that determine the goal system of the powerful agent) prefer A or B, then there is no moral significance to choosing between A and B in the target goal system. Therefore, we only need to determine “human goals” within the precision to which they are actually well-defined, not within absolute precision.

Evolution gave us a jumble of intuitions, which might contradict when we extrapolate them. So it’s fine to accept that our moral preferences may contain some contradictions.

The question is not whether it is “fine”. The question is, given a situation in which intuition A demands action X and intuition B demands action Y, what is the morally correct action? The answer might be “X”, it might be “Y”, it might be “both actions are equally good”, or it might be even “Z” for some Z different from both X and Y. But any answer effectively determines a way to remove the contradiction, replacing it by a consistent overarching system. And, if we actually face that situation, we need to actually choose an answer.

It is true that human and animal intelligence is “messy” in the sense that brains are complex and many of the fine details of their behavior are artifacts of either fine details in limitations of biological computational hardware, or fine details in the natural environment, or plain evolutionary accidents. However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence.

I used to think the same way, but the OP made me have a crisis of faith, and now I think the opposite way.

Sure, an animal brain solving an animal problem is messy. But a general purpose computer solving a simple mathematical problem can be just as messy. The algorithm for multiplying matrices in O(n^2.8) is more complex than the algorithm for doing it in O(n^3), and the algorithm with O(n^2.4) is way more complex than that. As I said in the other comment, “algorithms don’t get simpler as they get better”.

I don’t know a lot about the study of matrix multiplication complexity, but I think that one of the following two possibilities is likely to be true:

There is some ω∈R and an algorithm for matrix multiplication of complexity O(nω+ϵ) for any ϵ>0 s.t. no algorithm of complexity O(nω−ϵ) exists (AFAIK, the prevailing conjecture is ω=2). This algorithm is simple enough for human mathematicians to find it, understand it and analyze its computational complexity. Moreover, there is a mathematical proof of its optimality that is simple enough for human mathematicians to find and understand.

There is a progression of algorithms for lower and lower exponents that increases in description complexity without bound as the exponent approaches ω from above, and the problem of computing a program with given exponent is computationally intractable or even uncomputable. This fact has a mathematical proof that is simple enough for human mathematicians to find and understand.

Moreover, if we only care about having a polynomial time algorithm with some exponent then the solution is simple (and doesn’t require any astronomical coefficients like Levin search; incidentally, the O(n3) algorithm is also good enough for most real world applications). In either case, the computational complexity of matrix multiplication is understandable in the sense I expect intelligence to be understandable.

So, it is possible that there is a relatively simple and effective algorithm for intelligence (although I still expect a lot of “messy” tweaking to get a good algorithm for any specific hardware architecture; indeed, computational complexity is only defined up to a polynomial if you don’t specify a model of computation), or it is possible that there is a progression of increasingly complex and powerful algorithms that are very expensive to find. In the latter case, long AGI timelines become much more probable since biological evolution invested an enormous amount of resources in the search which we cannot easily match. In either case, there should be a theory that (i) defines what intelligence is (ii) predicts how intelligence depends on parameters such as description complexity and computational complexity.

A good algorithm can be easy to find, but not simple in the other senses of the word. Machine learning can output an algorithm that seems to perform well, but has a long description and is hard to prove stuff about. The same is true for human intelligence. So we might not be able to find an algorithm that’s as strong as human intelligence but easier to prove stuff about.

Machine learning uses data samples about an unknown phenomenon to extrapolate and predict the phenomenon in new instances. Such algorithms can have provable guarantees regarding the quality of the generalization: this is exactly what computational learning theory is about. Deep learning is currently poorly understood, but this seems more like a result of how young the field is, rather than some inherent mysteriousness of neural networks. And even so, there is already someprogress. People have been making buildings and cannons before Newtonian mechanics, engines before thermodynamics and ways of using chemical reactions before quantum mechanics or modern atomic theory. The fact you can do something using trial and error doesn’t mean trial and error is the only way to do it.

Deep learning is currently poorly understood, but this seems more like a result of how young the field is, rather than some inherent mysteriousness of neural networks.

I think “inherent mysteriousness” is also possible. Some complex things are intractable to prove stuff about.

I disagree that intelligence and rationality are more fundamental than physics; the territory itself is physics, and that is all that is really there. Everything else (including the body of our phone knowledge) are models for navigating that territory.

Turing formalised computation and established the limits of computation given certain assumptions. However, those limits only apply as long as the assumptions are true. Turing did not prove that no mechanical system is superior to a Universal Turing Machine, and weird physics may enable super Turing computation.

The point I was making is that our models are only as good as their correlation with the territory. The abstract models we have aren’t part of the territory itself.

Physics is not the territory, physics is (quite explicitly) the models we have of the territory. Rationality consists of the rules for formulating these models, and in this sense it is prior to physics and more fundumental. (This might be a disagreement over use of words. If by “physics” you, by definition, refer to the territory, then it seems to miss my point about Occam’s razor. Occam’s razor says that the map should be parsimonious, not the territory: the latter would be a type error.) In fact, we can adopt the view that Solomonoff induction (which is a model of rationality) is the ultimate physical law: it is a mathematical rule of making predictions that generates all the other rules we can come up with. Such a point of view, although in some sense justified, at present would be impractical: this is because we know how to compute using actual physical models (including running computer simulations), but not so much using models of rationality. But this is just another way of saying we haven’t constructed AGI yet.

I don’t think it’s meaningful to say that “weird physics may enable super Turing computation.” Hypercomputation is just a mathematical abstraction. We can imagine, to a point, that we live in a universe which contains hypercomputers, but since our own brain is not a hypercomputer, we can never fully test such a theory. This IMO is the most fundumental significance of the Church-Turing thesis: since we only perceive the world through the lens of our own mind,then from our subjective point of view, the world only contains computable processes.

If your mind was computable but the external world had lots of seeming hypercomputation (e.g. boxes for solving the halting problem were sold on every corner and were apparently infallible), would you prefer to build an AI that used a prior over hypercomputable worlds, or an AI that used Solomonoff induction because it’s the ultimate physical law?

What does it mean to have a box for solving the halting problem? How do you know it really solves the halting problem? There are some computable tests we can think of, but they would be incomplete, and you would only verify that the box satisfies those computable tests, not that is “really” a hypercomputer. There would be a lot of possible boxes that don’t solve the halting problem that pass the same computable tests.

If there is some powerful computational hardware available, I would want the AI the use that hardware. If you imagine the hardware as being hypercomputers, then you can think of such an AI as having a “prior over hypercomputable worlds”. But you can alternatively think of it as reasoning using computable hypotheses about the correspondence between the output of this hardware and the output of its sensors. The latter point of view is better, I think, because you can never know the hardware is really a hypercomputer.

Hmm, that approach might be ruling out not only hypercomputers, but also sufficiently powerful conventional computers (anything stronger than PSPACE maybe?) because your mind isn’t large enough to verify their strength. Is that right?

In some sense, yes, although for conventional computers you might settle on very slow verification. Unless you mean that, your mind has only finite memory/lifespan and therefore you cannot verify an arbitrary conventional computer within any given credence, which is also true. Under favorable conditions, you can quickly verify something in PSPACE (using interactive proof protocols), and given extra assumptions you might be able to do better (if you have two provers that cannot communicate you can do NEXP, or if you have a computer whose memory you can reliably delete you can do an EXP-complete language), however it is not clear whether you can be justifiably highly certain of such extra assumptions.

This can’t be right … Turing machines are assumed to be able to operate for unbounded time, using unbounded memory, without breaking down or making errors. Even finite automata can have any number of states and operate on inputs of unbounded size. By your logic, human minds shouldn’t be modeling physical systems using such automata, since they exceed the capabilities of our brains.

It’s not that hard to imagine hypothetical experimental evidence that would make it reasonable to believe that hypercomputers could exist. For example, suppose someone demonstrated a physical system that somehow emulated a universal Turing machine with infinite tape, using only finite matter and energy, and that this system could somehow run the emulation at an accelerating rate, such that it computed n steps in ∑nk=112k seconds. (Let’s just say that it resets to its initial state in a poof of pixie dust if the TM doesn’t halt after one second.)

You could try to reproduce this experiment and test it on various programs whose long-term behavior is predictable, but you could only test it on a finite (to say nothing of computable) set of such inputs. Still, if no one could come up with a test that stumped it, it would be reasonable to conclude that it worked as advertised. (Of course, some alternative explanation would be more plausible at first, given that the device as described would contradict well established physical principles, but eventually the weight of evidence would compel one to rewrite physics instead.)

One could hypothesize that the device only behaved as advertised on inputs for which human brains have the resources to verify the correctness of its answers, but did something else on other inputs, but you could just as well say that about a normal computer. There’d be no reason to believe such an alternative model, unless it was somehow more parsimonious. I don’t know any reason to think that theories that don’t posit uncomputable behavior can always be found which are at least as simple as a given theory that does.

Having said all that, I’m not sure any of it supports either side of the argument over whether there’s an ideal mathematical model of general intelligence, or whether there’s some sense in which intelligence is more fundamental than physics. I will say that I don’t think the Church-Turing thesis is some sort of metaphysical necessity baked into the concept of rationality. I’d characterize it as an empirical claim about (1) human intuition about what constitutes an algorithm, and (2) contingent limitations imposed on machines by the laws of physics.

It is true that a human brain is more precisely described as a finite automaton than a Turing machine. And if we take finite lifespan into account, then it’s not even a finite automaton. However, these abstractions are useful models since they become accurate in certain asymptotic limits that are sufficiently useful to describe reality. On the other hand, I doubt that there is a useful approximation in which the brain is a hypercomputer (except maybe some weak forms of hypercomputation like non-uniform computation / circuit complexity).

Moreover, one should distinguish between different senses in which we can be “modeling” something. The first sense is the core, unconscious ability of the brain to generate models, and in particular that which we experience as intuition. This ability can (IMO) be thought of as some kind of machine learning algorithm, and, I doubt that hypercomputation is relevant there in any way. The second sense is the “modeling” we do by manipulating linguistic (symbolic) constructs in our conscious mind. These constructs might be formulas in some mathematical theory, including formulas that represent claims about uncomputable objects. However, these symbolic manipulations are just another computable process, and it is only the results of these manipulations that we use to generate predictions and/or test models, since this is the only access we have to those uncomputable objects.

Regarding your hypothetical device, I wonder how would you tell whether it is the hypercomputer you imagine it to be, versus the realization of the same hypercomputer in some non-standard model of ZFC? (In particular, the latter could tell you that some Turing machine halts when it “really” doesn’t, because in the model it halts after some non-standard number of computing steps.) More generally, given an uncomputable function h and a system under test f, there is no sequence of computable tests that will allow you to form some credence about the hypothesis f=h s.t. this credence will converge to 1 when the hypothesis is true and 0 when the hypothesis is false. (This can be made an actual theorem.) This is different from the situation with normal computers (i.e. computable h) when you can devise such a sequence of tests. (Although you can in principle have a class of uncomputable hypotheses s.t. you can asymptotically verify f is in the class, for example the class of all functions h s.t. it is consistent with ZFC that h is the halting function. But the verification would be extremely slow and relatively parsimonious competing hypotheses would remain plausible for an extremely (uncomputably) long time. In any case, notice that the class itself has, in some strong sense, a computable description: specifically, the computable verification procedure itself.)

My point is, the Church-Turing thesis implies (IMO) that the mathematical model of rationality/intelligence should be based on Turing machines at most, and this observation does not strongly depend on assumptions about physics. (Well, if hypercomputation is physically possible, and realized in the brain, and there is some intuitive part of our mind that uses hypercomputation in a crucial way, then this assertion would be wrong. That would contradict my own intuition about what reasoning is (including intuitive reasoning), besides everything we know about physics, but obviously this hypothesis has some positive probability.)

I didn’t mean to suggest that the possibility of hypercomputers should be taken seriously as a physical hypothesis, or at least, any more seriously than time machines, perpetual motion machines, faster-than-light, etc. And I think it’s similarly irrelevant to the study of intelligence, machine or human. But in my thought experiment, the way I imagined it working was that, whenever the device’s universal-Turing-machine emulator halted, you could then examine its internal state as thoroughly as you liked, to make sure everything was consistent with the hypothesis that it worked as specified (and the non-halting case could be ascertained by the presence of pixie dust 🙂). But since its memory contents upon halting could be arbitrarily large, in practice you wouldn’t be able to examine it fully even for individual computations of sufficient complexity. Still, if you did enough consistency checks on enough different kinds of computations, and the cleverest scientists couldn’t come up with a test that the machine didn’t pass, I think believing that the machine was a true halting-problem oracle would be empirically justified.

It’s true that a black box oracle could output a nonstandard “counterfeit” halting function which claimed that some actually non-halting TMs do halt, only for TMs that can’t be proved to halt within ZFC or any other plausible axiomatic foundation humans ever come up with, in which case we would never know that it was lying to us. It would be trickier for the device I described to pull off such a deception, because it would have to actually halt and show us its output in such cases. For example, if it claimed that some actually non-halting TM M halted, we could feed it a program that emulated M and output the number of steps M took to halt. That program would also have to halt, and output some specific number n. In principle, we could then try emulating M for n steps on a regular computer, observe that M hadn’t reached a halting state, and conclude that the device was lying to us. If n were large enough, that wouldn’t be feasible, but it’s a decisive test that a normal computer could execute in principle. I suppose my magical device could instead do something like leave an infinite output string in memory, that a normal computer would never know was infinite, because it could only ever examine finitely much of it. But finite resource bounds already prevent us from completely ruling out far-fetched hypotheses about even normal computers. We’ll never be able to test, e.g., an arbitrary-precision integer comparison function on all inputs that could feasibly be written down. Can we be sure it always returns a Boolean value, and never returns the Warner Brothers dancing frog?

Actually, hypothesizing that my device “computed” a nonstandard version of the halting function would already be sort of self-defeating from a standpoint of skepticism about hypercomputation, because all nonstandard models of Peano arithmetic are known to be uncomputable. A better skeptical hypothesis would be that the device passed off some actually halting TMs as non-halting, but only in cases where the shortest proof that any of those TMs would have halted eventually was too long for humans to have discovered yet. I don’t know enough about Solomonoff induction to say whether it would unduly privilege such hypotheses over the hypothesis that the device was a true hypercomputer (if it could even entertain such a hypothesis). Intuitively, though, it seems to me that, if you went long enough without finding proof that the device wasn’t a true hypercomputer, continuing to insist that such proof would be found at some future time would start to sound like a God-of-the-gaps argument. I think this reasoning is valid even in a hypothetical universe in which human brains couldn’t do anything Turing machines can’t do, but other physical systems could. I admit that’s a nontrivial, contestable conclusion. I’m just going on intuition here.

Nearly everything you said here was already addressed in my previous comment. Perhaps I didn’t explain myself clearly?

It would be trickier for the device I described to pull off such a deception, because it would have to actually halt and show us its output in such cases.

I wrote before that “I wonder how would you tell whether it is the hypercomputer you imagine it to be, versus the realization of the same hypercomputer in some non-standard model of ZFC?”

So, the realization of a particular hypercomputer in a non-standard model of ZFC would pass all of your tests. You could examine its internal state or its output any way you like (i.e. ask any question that can be formulated in the language of ZFC) and everything you see would be consistent with ZFC. The number of steps for a machine that shouldn’t halt would be a non-standard number, so it would not fit on any finite storage. You could examine some finite subset of its digits (either from the end or from the beginning), for example, but that would not tell you the number is non-standard. For any question of the form “is n larger than some known number n0?” the answer would always be “yes”.

But finite resource bounds already prevent us from completely ruling out far-fetched hypotheses about even normal computers. We’ll never be able to test, e.g., an arbitrary-precision integer comparison function on all inputs that could feasibly be written down. Can we be sure it always returns a Boolean value, and never returns the Warner Brothers dancing frog?

Once again, there is a difference of principle. I wrote before that: ”...given an uncomputable function h and a system under test f, there is no sequence of computable tests that will allow you to form some credence about the hypothesis f=h s.t. this credence will converge to 1 when the hypothesis is true and 0 when the hypothesis is false. (This can be made an actual theorem.) This is different from the situation with normal computers (i.e. computable h) when you can devise such a sequence of tests.”

So, with normal computers you can become increasingly certain your hypothesis regarding the computer is true (even if you never become literally 100% certain, except in the limit), whereas with a hypercomputer you cannot.

Actually, hypothesizing that my device “computed” a nonstandard version of the halting function would already be sort of self-defeating from a standpoint of skepticism about hypercomputation, because all nonstandard models of Peano arithmetic are known to be uncomputable.

Yes, I already wrote that: “Although you can in principle have a class of uncomputable hypotheses s.t. you can asymptotically verify f is in the class, for example the class of all functions h s.t. it is consistent with ZFC that h is the halting function. But the verification would be extremely slow and relatively parsimonious competing hypotheses would remain plausible for an extremely (uncomputably) long time. In any case, notice that the class itself has, in some strong sense, a computable description: specifically, the computable verification procedure itself.”

So, yes, you could theoretically become certain the device is a hypercomputer (although reaching high certainly would take very long time), without knowing precisely which hypercomputer it is, but that doesn’t mean you need to add non-computable hypotheses to your “prior”, since that knowledge would still be expressible as a computable property of the world.

I don’t know enough about Solomonoff induction to say whether it would unduly privilege such hypotheses over the hypothesis that the device was a true hypercomputer (if it could even entertain such a hypothesis).

Literal Solomonoff induction (or even bounded versions of Solomonoff induction) is probably not the ultimate “true” model of induction, I was just using it as a simple example before. The true model will allow expressing hypotheses such as “all the even-numbered bits in the sequence are 1”, which involve computable properties of the environment that do not specify it completely. Making this idea precise is somewhat technical.

Physics is not the territory, physics is (quite explicitly) the models we have of the territory.

People tend to use the word physics in both the map and the territory sense.

We can imagine, to a point, that we live in a universe which contains hypercomputers, but since our own brain is not a hypercomputer, we can never fully test such a theory.

That would follow if testing a theory consisted of solely running a simulation in your head, but that is not how physics, the science, works. If the universe was hypercomputational, that would manifest as failures of computatable physics. Note that you only need to run computable physics to generate predictions that are then falsified.

This IMO is the most fundumental significance of the Church-Turing thesis: since we only perceive the world through the lens of our own mind, then from our subjective point of view, the world only contains computable processes.

If true, that is a form of neo-Kantian idealism. Is that what you really wanted to say?

If the universe was hypercomputational, that would manifest as failures of computable physics.

Well, it would manifest as a failure to create a complete and deterministic theory of computable physics. If your physics doesn’t describe absolutely everything, hypercomputation can hide in places it doesn’t describe. If your physics is stochastic (like quantum mechanics for example) then the random bits can secretly follow a hypercomputable pattern. Sort of “hypercomputer of the gaps”. Like I wrote before, there actually can be situations in which we gradually become confident that something is a hypercomputer (although certainty would grow very slowly), but we will never know precisely what kind of hypercomputer it is.

If true, that is a form of neo-Kantian idealism. Is that what you really wanted to say?

Unfortunately I am not sufficiently versed in philosophy to say. I do not make any strong claims to novelty or originality.

The question is, given a situation in which intuition A demands action X and intuition B demands action Y, what is the morally correct action? The answer might be “X”, it might be “Y”, it might be “both actions are equally good”, or it might be even “Z” for some Z different from both X and Y. But any answer effectively determines a way to remove the contradiction, replacing it by a consistent overarching system. And, if we actually face that situation, we need to actually choose an answer.

This reminds me of my rephrasing of the description of epistemology. The standard description started out as “the science of knowledge” or colloquially, “how do we know what we know”. I’ve maintained, since reading Bartley (“The Retreat to Commitment”), that the right description is “How do we decide what to believe?” So your final sentence seems right to me, but that’s different from the rest of your argument, which presumes that there’s a “right” answer and our job is finding it. Our job is finding a decision procedure, and studying what differentiates “right” answers from “wrong” answers is useful fodder for that, but it’s not the actual goal.

“Fitness” just refers to the expected behavior of the number of descendants of a given organism or gene. Therefore, it is perfectly definable modulo the concept of a “descendant”. The latter is not as unambiguously defined as “momentum” but under normal conditions it is quite precise.

Similarly, you can define intelligence as expected performance on a broad suite of tasks. However, what I was trying to get at with “define its fitness in terms of more basic traits” is being able to build a model of how it can or should actually work, not just specify measurement criteria.

I wonder whether the OP also doesn’t count all of computational learning theory? Also, physics is definitely not a sufficient description of biology but on the other hand, physics is still very useful for understanding biology.

I do consider computational learning theory to be evidence for rationality realism. However, I think it’s an open question whether CLT will turn out to be particularly useful as we build smarter and smarter agents—to my knowledge it hasn’t played an important role in the success of deep learning, for instance. It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.

However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence. This is because the latter theory aims to describe mindspace as a whole rather than describing a particular rather arbitrary point inside it.

...

Now, “rationality” and “intelligence” are in some sense even more fundamental than physics… Therefore, it seems like there has to be a simple theory of intelligence.

This feels more like a restatement of our disagreement than an argument. I do feel some of the force of this intuition, but I can also picture a world in which it’s not the case. Note that most of the reasoning humans do is not math-like, but rather a sort of intuitive inference where we draw links between different vague concepts and recognise useful patterns—something we’re nowhere near able to formalise. I plan to write a follow-up post which describes my reasons for being skeptical about rationality realism in more detail.

We only need to determine “human goals” within the precision to which they are actually well-defined, not within absolute precision.

I agree, but it’s plausible that they are much less well-defined than they seem. The more we learn about neuroscience, the more the illusion of a unified self with coherent desires breaks down. There may be questions which we all agree are very morally important, but where most of us have ill-defined preferences such that our responses depend on the framing of the problem (e.g. the repugnant conclusion).

...what I was trying to get at with “define its fitness in terms of more basic traits” is being able to build a model of how it can or should actually work, not just specify measurement criteria.

Once again, it seems perfectly possible to build an abstract theory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!

I do consider computational learning theory to be evidence for rationality realism. However, I think it’s an open question whether CLT will turn out to be particularly useful as we build smarter and smarter agents—to my knowledge it hasn’t played an important role in the success of deep learning, for instance.

It plays a minor role in deep learning, in the sense that some “deep” algorithms are adaptations of algorithms that have theoretical guarantees. For example, deep Q-learning is an adaptation of ordinary Q-learning. Obviously I cannot prove that it is possible to create an abstract theory of intelligence without actually creating the theory. However, the same could be said about any endeavor in history.

It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.

Mathematical models of evolution might help you to build better evolutions. In order to build better birds, you would need mathematical models of birds, which are going to be much more messy.

This feels more like a restatement of our disagreement than an argument. I do feel some of the force of this intuition, but I can also picture a world in which it’s not the case.

I don’t think it’s a mere restatement? I am trying to show that “rationality realism” is what you should expect based on Occam’s razor, which is a fundamental principle of reason. Possibly I just don’t understand your position. In particular, I don’t know what epistemology is like in the world you imagine. Maybe it’s a subject for your next essay.

Note that most of the reasoning humans do is not math-like, but rather a sort of intuitive inference where we draw links between different vague concepts and recognise useful patterns

This seems to be confusing between objects and representations of objects. The assumption there is some mathematical theory at the core of human reasoning does not mean that a description of this mathematically theory should automatically exist in the conscious, symbol-manipulating part of the mind. You can have a reinforcement learning algorithm that is perfectly well-understood mathematically, and yet nowhere inside the state of the algorithm is a description of the algorithm itself or the mathematics behind it.

There may be questions which we all agree are very morally important, but where most of us have ill-defined preferences such that our responses depend on the framing of the problem (e.g. the repugnant conclusion).

The response might depend on the framing if you’re asked a question and given 10 seconds to answer it. If you’re allowed to deliberate on the question, and in particular consider alternative framings, the answer becomes more well-defined. However, even if it is ill-defined, it doesn’t really change anything. We can still ask the question “given the ability to optimize any utility function over the world now, what utility function should we choose?” Perhaps it means that we need consider our answers to ethical questions provided a randomly generated framing. Or maybe it means something else. But in any case, it is a question that can and should be answered.

It seems perfectly possible to build an abstract theory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!

...

It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.

It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like “species will evolve faster when there are predators in their environment” and “species which use sexual reproduction will be able to adapt faster to novel environments”. The analogous abstract theory of intelligence can tell us things like “agents will be less able to achieve their goals when they are opposed by other agents” and “agents with more compute will perform better in novel environments”. These sorts of conclusions are not very useful for safety.

I don’t think it’s a mere restatement? I am trying to show that “rationality realism” is what you should expect based on Occam’s razor, which is a fundamental principle of reason.

Sorry, my response was a little lazy, but at the same time I’m finding it very difficult to figure out how to phrase a counterargument beyond simply saying that although intelligence does allow us to understand physics, it doesn’t seem to me that this implies it’s simple or fundamental. Maybe one relevant analogy: maths allows us to analyse tic-tac-toe, but maths is much more complex than tic-tac-toe. I understand that this is probably an unsatisfactory intuition from your perspective, but unfortunately don’t have time to think too much more about this now; will cover it in a follow-up.

You can have a reinforcement learning algorithm that is perfectly well-understood mathematically, and yet nowhere inside the state of the algorithm is a description of the algorithm itself or the mathematics behind it.

Agreed. But the fact that the main component of human reasoning is something which we have no idea how to formalise is some evidence against the possibility of formalisation—evidence which might be underweighted if people think of maths proofs as a representative example of reasoning.

We can still ask the question “given the ability to optimize any utility function over the world now, what utility function should we choose?” Perhaps it means that we need consider our answers to ethical questions provided a randomly generated framing. Or maybe it means something else. But in any case, it is a question that can and should be answered.

I’m going to cop out of answering this as well, on the grounds that I have yet another post in the works which deals with it more directly. One relevant claim, though: that extreme optimisation is fundamentally alien to the human psyche, and I’m not sure there’s any possible utility function which we’d actually be satisfied with maximising.

It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like “species will evolve faster when there are predators in their environment” and “species which use sexual reproduction will be able to adapt faster to novel environments”. The analogous abstract theory of intelligence can tell us things like “agents will be less able to achieve their goals when they are opposed by other agents” and “agents with more compute will perform better in novel environments”. These sorts of conclusions are not very useful for safety.

As a matter of fact, I emphatically do not agree. “Birds” are a confusing example, because it speaks of modifying an existing (messy, complicated, poorly designed) system rather than making something from scratch. If we wanted to make something vaguely bird-like from scratch, we might have needed something like a “theory of self-sustaining, self-replicating machines”.

Let’s consider a clearer example: cars. In order to build a car, it is very useful to have a theory of mechanics, chemistry, thermodynamic etc. Just doings things by trial and error would be much less effective, especially if you don’t want the car to occasionally explode (given that the frequency of explosions might be too low to affordably detect during testing). This is not because a car is “simple”: a spaceship or, let’s say, a gravity wave detector is much more complex than a car, and yet you hardly need less theory to make one.

And another example: cryptography. In fact, cryptography is not so far from AI safety: in the former case, you defend against an external adversary whereas in the latter you defend against perverse incentives and subagents inside the AI. If we had this conversation in the 1960s (say), you might have said that cryptography is obviously a complex, messy domain, and theorizing about it is next to useless, or at least not helpful for designing actual encryption systems (there was Shannon’s work, but since it ignored computational complexity you can maybe compare it to algorithmic information theory and statistical learning theory for AI today; if we had this conversation in the 1930s, then there would next to no theory at all, even though encryption was practiced since ancient times). And yet, today theory plays an essential role in this field. The domain actually is very hard: most of the theory relies on complexity theoretic conjectures that we are still far from being able to prove (although I expect that most theoretical computer scientists would agree that eventually we will solve them). However, even without being able to formally prove everything, the ability to reduce the safety of many different protocols to a limited number of interconnected conjectures (some of which have an abundance of both theoretical and empirical evidence) allows us to immensely increase our confidence in those protocols.

Similarly, I expect an abstract theory of intelligence to be immensely useful for AI safety. Even just having precise language to define what “AI safety” means would be very helpful, especially to avoid counter-intuitive failure modes like the malign prior. At the very least, we could have provably safe but impractical machine learning protocols that would be an inspiration to more complex algorithms about which we cannot prove things directly (like in deep learning today). More optimistically (but still realistically IMO) we could have practical algorithms satisfying theoretical guarantees modulo a small number of well-studied conjectures, like in cryptography today. This way, theoretical and empirical research could feed into each other, the whole significantly surpassing the sum of its parts.

Although I don’t necessarily subscribe to the

preciseset of claims characterized as “realism about rationality”, I do think this broad mindset is mostly correct, and the objections outlined in this essay are mostly wrong.This seems entirely wrong to me. Evolution definitely should be studied using mathematical models, and although I am not an expert in that, AFAIK this approach is fairly standard. “Fitness” just refers to the expected behavior of the number of descendants of a given organism or gene. Therefore, it is perfectly definable modulo the concept of a “descendant”. The latter is not as unambiguously defined as “momentum” but under normal conditions it is quite precise. The actual structure and dynamics of biological organisms and their environment is very complicated, but this does not preclude the

abstractstudy of evolution, i.e. understanding which sort of dynamics are possible in principle (for general environments) and in which way they depend on the environment etc. Applying this knowledge to real-life evolution is not trivial (and itdoesrequire a lot of complementary empirical research), as is the application of theoretical knowledge inanydomain to “messy” real-life examples, but that doesn’t mean such knowledge is useless. On the contrary, such knowledge is often essential to progress.I wonder whether the OP also doesn’t count all of computational learning theory? Also, physics is definitely not a

sufficientdescription of biology but on the other hand, physics is still veryusefulfor understanding biology. Indeed, it’s hard to imagine we would achieve the modern level of understanding chemistry without understanding at least non-relativistic quantum mechanics, and it’s hard to imagine we would make much headway in molecular biology without chemistry, thermodynamics et cetera.Once again, the OP uses the concept of “messiness” in a rather ambiguous way. It is true that human and animal intelligence is “messy” in the sense that brains are complex and many of the fine details of their behavior are artifacts of either fine details in limitations of biological computational hardware, or fine details in the natural environment, or plain evolutionary accidents. However, this does

notmean that it is impossible to speak of a relatively simpleabstracttheory of intelligence. This is because the latter theory aims to describe mindspaceas a wholerather than describing a particular rather arbitrary point inside it.The disagreement here seems to revolve around the question of, when should we expect to have a simple theory for a given phenomenon (i.e. when does Occam’s razor apply)? It seems clear that we should expect to have a simple theory of e.g. fundumental physics, but not a simple equation for the coastline of Africa. The difference is, physics is a unique object that has a fundumental role, whereas Africa is just one arbitrary continent among the set of all continents on all planets in the universe throughout its lifetime and all Everett branches. Therefore, we don’t expect a simple description of Africa, but we do expect a relatively simple description of planetary physics that would tell us which continent shapes are possible and which are more likely.

Now, “rationality” and “intelligence” are in some sense even

morefundumental than physics. Indeed, rationality is what tells us how to form correct beliefs, i.e. how tofindthe correct theory of physics. Looking an anthropic paradoxes, it is even arguable that making decisions is even more fundumental than forming beliefs (since anthropic paradoxes are situations in which assigning subjective probabilities seems meaningless but the correct decision is still well-defined via “functional decision theory” or something similar). Therefore, it seems like therehasto be a simple theory of intelligence, even if specific instances of intelligence are complex by virtue of their adaptation to specific computational hardware, specific utility function (or maybe some more general concept of “values”), somewhat specific (although still fairly diverse) class of environments, and also by virtue of arbitrary flaws in their design (that are still mild enough to allow for intelligent behavior).This line of thought would benefit from more clearly delineating descriptive versus prescriptive. The question we are trying to answer is: “if we build a powerful goal-oriented agent, what goal system

shouldwe give it?” That is, it is fundamentally a prescriptive rather than descriptive question. It seems rather clear that the best choice of goal system would be insomesense similar to “human goals”. Moreover, it seems that if possibilities A and B are such that isill-definedwhether humans (or at least, those humans that determine the goal system of the powerful agent) prefer A or B, then there is no moral significance to choosing between A and B in the target goal system. Therefore, we only need to determine “human goals” within the precision to which they are actually well-defined,notwithin absolute precision.The question is not whether it is “fine”. The question is, given a situation in which intuition A demands action X and intuition B demands action Y, what is the morally correct action? The answer might be “X”, it might be “Y”, it might be “both actions are equally good”, or it might be even “Z” for some Z different from both X and Y. But any answer effectively determines a way to remove the contradiction, replacing it by a consistent overarching system. And, if we actually face that situation, we need to actually choose an answer.

I used to think the same way, but the OP made me have a crisis of faith, and now I think the opposite way.

Sure, an animal brain solving an animal problem is messy. But a general purpose computer solving a simple mathematical problem can be just as messy. The algorithm for multiplying matrices in O(n^2.8) is more complex than the algorithm for doing it in O(n^3), and the algorithm with O(n^2.4) is way more complex than that. As I said in the other comment, “algorithms don’t get simpler as they get better”.

I don’t know a lot about the study of matrix multiplication complexity, but I think that one of the following two possibilities is likely to be true:

There is some ω∈R and an algorithm for matrix multiplication of complexity O(nω+ϵ) for any ϵ>0 s.t. no algorithm of complexity O(nω−ϵ) exists (AFAIK, the prevailing conjecture is ω=2). This algorithm is simple enough for human mathematicians to find it, understand it and analyze its computational complexity. Moreover, there is a mathematical proof of its optimality that is simple enough for human mathematicians to find and understand.

There is a progression of algorithms for lower and lower exponents that increases in description complexity without bound as the exponent approaches ω from above, and the problem of computing a program with given exponent is computationally intractable or even uncomputable. This fact has a mathematical proof that is simple enough for human mathematicians to find and understand.

Moreover, if we only care about having a polynomial time algorithm with

someexponent then the solution is simple (and doesn’t require any astronomical coefficients like Levin search; incidentally, the O(n3) algorithm is also good enough for most real world applications). In either case, the computational complexity of matrix multiplication isunderstandablein the sense I expect intelligence to be understandable.So, it is possible that there is a relatively simple and effective algorithm for intelligence (although I still expect a lot of “messy” tweaking to get a good algorithm for any specific hardware architecture; indeed, computational complexity is only defined up to a polynomial if you don’t specify a model of computation), or it is possible that there is a progression of increasingly complex and powerful algorithms that are very expensive to find. In the latter case, long AGI timelines become much more probable since biological evolution invested an enormous amount of resources in the search which we cannot easily match. In either case, there should be a theory that (i) defines what intelligence is (ii) predicts how intelligence depends on parameters such as description complexity and computational complexity.

A good algorithm can be easy to find, but not simple in the other senses of the word. Machine learning can output an algorithm that seems to perform well, but has a long description and is hard to prove stuff about. The same is true for human intelligence. So we might not be able to find an algorithm that’s as strong as human intelligence but easier to prove stuff about.

Machine learning uses data samples about an unknown phenomenon to extrapolate and predict the phenomenon in new instances. Such algorithms can have provable guarantees regarding the quality of the generalization: this is exactly what computational learning theory is about.

Deeplearning is currently poorly understood, but this seems more like a result of how young the field is, rather than some inherent mysteriousness of neural networks. And even so, there is already some progress. People have been making buildings and cannons before Newtonian mechanics, engines before thermodynamics and ways of using chemical reactions before quantum mechanics or modern atomic theory. The fact you can do something using trial and error doesn’t mean trial and error is the only way to do it.I think “inherent mysteriousness” is also possible. Some complex things are intractable to prove stuff about.

I don’t see why better algorithms being more complex is a problem?

I disagree that intelligence and rationality are more fundamental than physics; the territory itself is physics, and that is all that is really there. Everything else (including the body of our phone knowledge) are models for navigating that territory.

Turing formalised computation and established the limits of computation given certain assumptions. However, those limits only apply as long as the assumptions are true. Turing did

notprove that no mechanical system is superior to a Universal Turing Machine, and weird physics may enable super Turing computation.The point I was making is that our models are only as good as their correlation with the territory. The abstract models we have aren’t part of the territory itself.

Physics is not the territory, physics is (quite explicitly) the models we have of the territory. Rationality consists of the rules for formulating these models, and in this sense it is prior to physics and more fundumental. (This might be a disagreement over use of words. If by “physics” you, by definition, refer to the territory, then it seems to miss my point about Occam’s razor. Occam’s razor says that the

mapshould be parsimonious, not the territory: the latter would be a type error.) In fact, we can adopt the view that Solomonoff induction (which is a model of rationality) is the ultimate physical law: it is a mathematical rule of making predictions that generates all the other rules we can come up with. Such a point of view, although in some sense justified,at presentwould be impractical: this is because we know how to compute using actual physical models (including running computer simulations), but not so much using models of rationality. But this is just another way of saying we haven’t constructed AGI yet.I don’t think it’s meaningful to say that “weird physics may enable super Turing computation.” Hypercomputation is just a mathematical abstraction. We can imagine, to a point, that we live in a universe which contains hypercomputers, but since our own brain isthen from our subjective point of view, the world only contains computable processes.

nota hypercomputer, we can never fullytestsuch a theory. This IMO is the most fundumental significance of the Church-Turing thesis: since we only perceive the world through the lens of our own mind,If your mind was computable but the external world had lots of seeming hypercomputation (e.g. boxes for solving the halting problem were sold on every corner and were apparently infallible), would you prefer to build an AI that used a prior over hypercomputable worlds, or an AI that used Solomonoff induction because it’s the ultimate physical law?

What does it mean to have a box for solving the halting problem? How do you know it really solves the halting problem? There are some computable tests we can think of, but they would be incomplete, and you would only verify that the box satisfies those

computabletests, not that is “really” a hypercomputer. There would be a lot of possible boxes thatdon’tsolve the halting problem that pass the same computable tests.If there is some powerful computational hardware available, I would want the AI the use that hardware. If you imagine the hardware as being hypercomputers, then you can think of such an AI as having a “prior over hypercomputable worlds”. But you can alternatively think of it as reasoning using computable hypotheses about the correspondence between the output of this hardware and the output of its sensors. The latter point of view is better, I think, because you can never know the hardware is really a hypercomputer.

Hmm, that approach might be ruling out not only hypercomputers, but also sufficiently powerful conventional computers (anything stronger than PSPACE maybe?) because your mind isn’t large enough to verify their strength. Is that right?

In some sense, yes, although for conventional computers you might settle on very slow verification. Unless you mean that, your mind has only finite memory/lifespan and therefore you cannot verify an arbitrary conventional computer within any given credence, which is also true. Under favorable conditions, you can quickly verify something in PSPACE (using interactive proof protocols), and given extra assumptions you might be able to do better (if you have two provers that cannot communicate you can do NEXP, or if you have a computer whose memory you can reliably delete you can do an EXP-complete language), however it is not clear whether you can be justifiably highly certain of such extra assumptions.

See also my reply to lbThingrb.

This can’t be right … Turing machines are assumed to be able to operate for unbounded time, using unbounded memory, without breaking down or making errors. Even finite automata can have any number of states and operate on inputs of unbounded size. By your logic, human minds shouldn’t be modeling physical systems using such automata, since they exceed the capabilities of our brains.

It’s not that hard to imagine hypothetical experimental evidence that would make it reasonable to believe that hypercomputers could exist. For example, suppose someone demonstrated a physical system that somehow emulated a universal Turing machine with infinite tape, using only finite matter and energy, and that this system could somehow run the emulation at an accelerating rate, such that it computed

nsteps in ∑nk=112k seconds. (Let’s just say that it resets to its initial state in a poof of pixie dust if the TM doesn’t halt after one second.)You could try to reproduce this experiment and test it on various programs whose long-term behavior is predictable, but you could only test it on a finite (to say nothing of computable) set of such inputs. Still, if no one could come up with a test that stumped it, it would be reasonable to conclude that it worked as advertised. (Of course, some alternative explanation would be more plausible at first, given that the device as described would contradict well established physical principles, but eventually the weight of evidence would compel one to rewrite physics instead.)

One could hypothesize that the device only behaved as advertised on inputs for which human brains have the resources to verify the correctness of its answers, but did something else on other inputs, but you could just as well say that about a normal computer. There’d be no reason to believe such an alternative model, unless it was somehow more parsimonious. I don’t know any reason to think that theories that don’t posit uncomputable behavior can always be found which are at least as simple as a given theory that does.

Having said all that, I’m not sure any of it supports either side of the argument over whether there’s an ideal mathematical model of general intelligence, or whether there’s some sense in which intelligence is more fundamental than physics. I will say that I don’t think the Church-Turing thesis is some sort of metaphysical necessity baked into the concept of rationality. I’d characterize it as an empirical claim about (1) human intuition about what constitutes an algorithm, and (2) contingent limitations imposed on machines by the laws of physics.

It is true that a human brain is more precisely described as a finite automaton than a Turing machine. And if we take finite lifespan into account, then it’s not even a finite automaton. However, these abstractions are useful models since they become accurate in certain asymptotic limits that are sufficiently useful to describe reality. On the other hand, I doubt that there is a useful approximation in which the brain is a hypercomputer (except

maybesome weak forms of hypercomputation like non-uniform computation / circuit complexity).Moreover, one should distinguish between different senses in which we can be “modeling” something. The first sense is the core, unconscious ability of the brain to generate models, and in particular that which we experience as intuition. This ability can (IMO) be thought of as some kind of machine learning algorithm, and, I doubt that hypercomputation is relevant there in any way. The second sense is the “modeling” we do by manipulating linguistic (symbolic) constructs in our conscious mind. These constructs might be formulas in some mathematical theory, including formulas that represent claims about uncomputable objects. However, these symbolic manipulations are just another

computableprocess, and it is only the results of these manipulations that we use to generate predictions and/or test models, since this is the only access we have to those uncomputable objects.Regarding your hypothetical device, I wonder how would you tell whether it is the hypercomputer you imagine it to be, versus the realization of the same hypercomputer in some non-standard model of ZFC? (In particular, the latter could tell you that some Turing machine halts when it “really” doesn’t, because in the model it halts after some non-standard number of computing steps.) More generally, given an uncomputable function h and a system under test f, there is no sequence of computable tests that will allow you to form some credence about the hypothesis f=h s.t. this credence will converge to 1 when the hypothesis is true and 0 when the hypothesis is false. (This can be made an actual theorem.) This is different from the situation with normal computers (i.e. computable h) when you

candevise such a sequence of tests. (Although you can in principle have aclassof uncomputable hypotheses s.t. you can asymptotically verify f is in the class, for example the class of all functions h s.t. it is consistent with ZFC that h is the halting function. But the verification would be extremely slow and relatively parsimonious competing hypotheses would remain plausible for an extremely (uncomputably) long time. In any case, notice that the class itself has, in some strong sense, a computable description: specifically, the computable verification procedure itself.)My point is, the Church-Turing thesis implies (IMO) that the mathematical model of rationality/intelligence should be based on Turing machines

at most, and this observation doesnotstrongly depend on assumptions about physics. (Well, if hypercomputationisphysically possible,andrealized in the brain, and there is some intuitive part of our mind that uses hypercomputation in a crucial way, then this assertion would be wrong. That would contradict my own intuition about what reasoningis(including intuitive reasoning),besideseverything we know about physics, but obviously this hypothesis hassomepositive probability.)I didn’t mean to suggest that the possibility of hypercomputers should be taken seriously as a physical hypothesis, or at least, any more seriously than time machines, perpetual motion machines, faster-than-light, etc. And I think it’s similarly irrelevant to the study of intelligence, machine or human. But in my thought experiment, the way I imagined it working was that, whenever the device’s universal-Turing-machine emulator halted, you could then examine its internal state as thoroughly as you liked, to make sure everything was consistent with the hypothesis that it worked as specified (and the non-halting case could be ascertained by the presence of pixie dust 🙂). But since its memory contents upon halting could be arbitrarily large, in practice you wouldn’t be able to examine it fully even for individual computations of sufficient complexity. Still, if you did enough consistency checks on enough different kinds of computations, and the cleverest scientists couldn’t come up with a test that the machine didn’t pass, I think believing that the machine was a true halting-problem oracle would be empirically justified.

It’s true that a black box oracle could output a nonstandard “counterfeit” halting function which claimed that some actually non-halting TMs do halt, only for TMs that can’t be proved to halt within ZFC or any other plausible axiomatic foundation humans ever come up with, in which case we would never know that it was lying to us. It would be trickier for the device I described to pull off such a deception, because it would have to actually halt and show us its output in such cases. For example, if it claimed that some actually non-halting TM

Mhalted, we could feed it a program that emulatedMand output the number of stepsMtook to halt. That program would also have to halt, and output some specific numbern. In principle, we could then try emulatingMfornsteps on a regular computer, observe thatMhadn’t reached a halting state, and conclude that the device was lying to us. Ifnwere large enough, that wouldn’t be feasible, but it’s a decisive test that a normal computer could execute in principle. I suppose my magical device could instead do something like leave an infinite output string in memory, that a normal computer would never know was infinite, because it could only ever examine finitely much of it. But finite resource bounds already prevent us from completely ruling out far-fetched hypotheses about even normal computers. We’ll never be able to test, e.g., an arbitrary-precision integer comparison function onallinputs that could feasibly be written down. Can we besureit always returns a Boolean value, and never returns the Warner Brothers dancing frog?Actually, hypothesizing that my device “computed” a nonstandard version of the halting function would already be sort of self-defeating from a standpoint of skepticism about hypercomputation, because all nonstandard models of Peano arithmetic are known to be uncomputable. A better skeptical hypothesis would be that the device passed off some actually halting TMs as non-halting, but only in cases where the shortest proof that any of those TMs would have halted eventually was too long for humans to have discovered yet. I don’t know enough about Solomonoff induction to say whether it would unduly privilege such hypotheses over the hypothesis that the device was a true hypercomputer (if it could even entertain such a hypothesis). Intuitively, though, it seems to me that, if you went long enough without finding proof that the device wasn’t a true hypercomputer, continuing to insist that such proof would be found at some future time would start to sound like a God-of-the-gaps argument. I think this reasoning is valid even in a hypothetical universe in which human brains couldn’t do anything Turing machines can’t do, but other physical systems could. I admit that’s a nontrivial, contestable conclusion. I’m just going on intuition here.

Nearly everything you said here was already addressed in my previous comment. Perhaps I didn’t explain myself clearly?

I wrote before that “I wonder how would you tell whether it is the hypercomputer you imagine it to be, versus the realization of the same hypercomputer in some non-standard model of ZFC?”

So, the realization of a particular hypercomputer in a non-standard model of ZFC would pass all of your tests. You could examine its internal state or its output any way you like (i.e. ask any question that can be formulated in the language of ZFC) and everything you see would be consistent with ZFC. The number of steps for a machine that shouldn’t halt would be a non-standard number, so it would not fit on any finite storage. You could examine some finite subset of its digits (either from the end or from the beginning), for example, but that would not tell you the number is non-standard. For any question of the form “is n larger than some known number n0?” the answer would always be “yes”.

Once again, there is a difference of principle. I wrote before that: ”...given an uncomputable function h and a system under test f, there is no sequence of computable tests that will allow you to form some credence about the hypothesis f=h s.t. this credence will converge to 1 when the hypothesis is true and 0 when the hypothesis is false. (This can be made an actual theorem.) This is different from the situation with normal computers (i.e. computable h) when you

candevise such a sequence of tests.”So, with normal computers you can become increasingly certain your hypothesis regarding the computer is true (even if you never become literally 100% certain, except in the limit), whereas with a hypercomputer you cannot.

Yes, I already wrote that: “Although you can in principle have a

classof uncomputable hypotheses s.t. you can asymptotically verify f is in the class, for example the class of all functions h s.t. it is consistent with ZFC that h is the halting function. But the verification would be extremely slow and relatively parsimonious competing hypotheses would remain plausible for an extremely (uncomputably) long time. In any case, notice that the class itself has, in some strong sense, a computable description: specifically, the computable verification procedure itself.”So, yes, you could theoretically become certain the device is a hypercomputer (although reaching high certainly would take very long time), without knowing precisely

whichhypercomputer it is, but that doesn’t mean you need to add non-computable hypotheses to your “prior”, since that knowledge would still be expressible as a computable property of the world.Literal Solomonoff induction (or even bounded versions of Solomonoff induction) is probably

notthe ultimate “true” model of induction, I was just using it as a simple example before. The true model will allow expressing hypotheses such as “all the even-numbered bits in the sequence are 1”, which involve computable properties of the environment that do not specify it completely. Making this idea precise is somewhat technical.People tend to use the word physics in both the map and the territory sense.

That would follow if testing a theory consisted of solely running a simulation in your head, but that is not how physics, the science, works. If the universe was hypercomputational, that would manifest as failures of computatable physics. Note that you only need to run computable physics to generate predictions that are then falsified.

If true, that is a form of neo-Kantian idealism. Is that what you really wanted to say?

Well, it would manifest as a failure to create a

complete and deterministictheory of computable physics. If your physics doesn’t describe absolutely everything, hypercomputation can hide in places it doesn’t describe. If your physics is stochastic (like quantum mechanics for example) then the random bits can secretly follow a hypercomputable pattern. Sort of “hypercomputer of the gaps”. Like I wrote before, there actuallycanbe situations in which we gradually become confident that somethingisa hypercomputer (although certainty would grow very slowly), but we will never know precisely whatkindof hypercomputer it is.Unfortunately I am not sufficiently versed in philosophy to say. I do not make any strong claims to novelty or originality.

This reminds me of my rephrasing of the description of epistemology. The standard description started out as “the science of knowledge” or colloquially, “how do we know what we know”. I’ve maintained, since reading Bartley (“The Retreat to Commitment”), that the right description is “How do we decide what to believe?” So your final sentence seems right to me, but that’s different from the rest of your argument, which presumes that there’s a “right” answer and our job is finding it. Our job is finding a decision procedure, and studying what differentiates “right” answers from “wrong” answers is useful fodder for that, but it’s not the actual goal.

Similarly, you can define intelligence as expected performance on a broad suite of tasks. However, what I was trying to get at with “define its fitness in terms of more basic traits” is being able to build a model of how it can or should actually work, not just specify measurement criteria.

I do consider computational learning theory to be evidence for rationality realism. However, I think it’s an open question whether CLT will turn out to be particularly useful as we build smarter and smarter agents—to my knowledge it hasn’t played an important role in the success of deep learning, for instance. It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.

This feels more like a restatement of our disagreement than an argument. I do feel some of the force of this intuition, but I can also picture a world in which it’s not the case. Note that most of the reasoning humans do is not math-like, but rather a sort of intuitive inference where we draw links between different vague concepts and recognise useful patterns—something we’re nowhere near able to formalise. I plan to write a follow-up post which describes my reasons for being skeptical about rationality realism in more detail.

I agree, but it’s plausible that they are much less well-defined than they seem. The more we learn about neuroscience, the more the illusion of a unified self with coherent desires breaks down. There may be questions which we all agree are very morally important, but where most of us have ill-defined preferences such that our responses depend on the framing of the problem (e.g. the repugnant conclusion).

Once again, it seems perfectly possible to build an

abstracttheory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!It plays a minor role in deep learning, in the sense that some “deep” algorithms are adaptations of algorithms that have theoretical guarantees. For example, deep Q-learning is an adaptation of ordinary Q-learning. Obviously I cannot

provethat it is possible to create an abstract theory of intelligence without actually creating the theory. However, the same could be said about any endeavor in history.Mathematical models of evolution might help you to build better

evolutions. In order to build better birds, you would need mathematical models of birds, which are going to be much more messy.I don’t think it’s a mere restatement? I am trying to show that “rationality realism” is what you should expect based on Occam’s razor, which is a fundamental principle of reason. Possibly I just don’t understand your position. In particular, I don’t know what epistemology is like in the world you imagine. Maybe it’s a subject for your next essay.

This seems to be confusing between objects and representations of objects. The assumption there is some mathematical theory at the core of human reasoning does

notmean that a description of this mathematically theory should automatically exist in the conscious, symbol-manipulating part of the mind. You can have a reinforcement learning algorithm that is perfectly well-understood mathematically, and yet nowhere inside the state of the algorithm is a description of the algorithm itself or the mathematics behind it.The response might depend on the framing if you’re asked a question and given 10 seconds to answer it. If you’re allowed to deliberate on the question, and in particular consider

alternativeframings, the answer becomes more well-defined. However, even if it is ill-defined, it doesn’t really change anything. We can still ask the question “given the ability to optimize any utility function over the worldnow, what utility function should we choose?” Perhaps it means that we need consider our answers to ethical questions provided a randomly generated framing. Or maybe it means something else. But in any case, it is a question that can and should be answered.It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like “species will evolve faster when there are predators in their environment” and “species which use sexual reproduction will be able to adapt faster to novel environments”. The analogous abstract theory of intelligence can tell us things like “agents will be less able to achieve their goals when they are opposed by other agents” and “agents with more compute will perform better in novel environments”. These sorts of conclusions are not very useful for safety.

Sorry, my response was a little lazy, but at the same time I’m finding it very difficult to figure out how to phrase a counterargument beyond simply saying that although intelligence does allow us to understand physics, it doesn’t seem to me that this implies it’s simple or fundamental. Maybe one relevant analogy: maths allows us to analyse tic-tac-toe, but maths is much more complex than tic-tac-toe. I understand that this is probably an unsatisfactory intuition from your perspective, but unfortunately don’t have time to think too much more about this now; will cover it in a follow-up.

Agreed. But the fact that the main component of human reasoning is something which we have no idea how to formalise is some evidence against the possibility of formalisation—evidence which might be underweighted if people think of maths proofs as a representative example of reasoning.

I’m going to cop out of answering this as well, on the grounds that I have yet another post in the works which deals with it more directly. One relevant claim, though: that extreme optimisation is fundamentally alien to the human psyche, and I’m not sure there’s any possible utility function which we’d actually be satisfied with maximising.

As a matter of fact, I emphatically do

notagree. “Birds” are a confusing example, because it speaks of modifying an existing (messy, complicated, poorly designed) system rather than making something from scratch. If we wanted to make something vaguely bird-like from scratch, we might have needed something like a “theory of self-sustaining, self-replicating machines”.Let’s consider a clearer example: cars. In order to build a car, it is

veryuseful to have a theory of mechanics, chemistry, thermodynamic etc. Just doings things by trial and error would be much less effective,especiallyif you don’t want the car to occasionally explode (given that the frequency of explosions might be too low to affordably detect during testing). This is not because a car is “simple”: a spaceship or, let’s say, a gravity wave detector is much more complex than a car, and yet you hardly needlesstheory to make one.And another example: cryptography. In fact, cryptography is not so far from AI safety: in the former case, you defend against an external adversary whereas in the latter you defend against perverse incentives and subagents inside the AI. If we had this conversation in the 1960s (say), you might have said that cryptography is obviously a complex, messy domain, and theorizing about it is next to useless, or at least not helpful for designing actual encryption systems (there was Shannon’s work, but since it ignored computational complexity you can maybe compare it to algorithmic information theory and statistical learning theory for AI today; if we had this conversation in the 1930s, then there would next to no theory at all, even though encryption was practiced since ancient times). And yet, today theory plays an essential role in this field. The domain actually

isvery hard: most of the theory relies on complexity theoretic conjectures that we are still far from being able to prove (although I expect that most theoretical computer scientists would agree that eventually wewillsolve them).However, even without being able to formally prove everything, the ability to reduce the safety of many different protocols to a limited number of interconnected conjectures (some of which have an abundance of both theoretical and empirical evidence) allows us to immensely increase our confidence in those protocols.Similarly, I expect an abstract theory of intelligence to be immensely useful for AI safety. Even just having precise language to define what “AI safety” means would be very helpful, especially to avoid counter-intuitive failure modes like the malign prior.

At the very least, we could have provably safe but impractical machine learning protocols that would be aninspirationto more complex algorithms about which we cannot prove things directly (like in deep learning today). More optimistically (but still realistically IMO) we could have practical algorithms satisfying theoretical guarantees modulo a small number of well-studied conjectures, like in cryptography today. This way, theoretical and empirical research could feed into each other, the whole significantly surpassing the sum of its parts.