Although in some sense I also endorse the “strawman” that rationality is more like momentum than like fitness (at least some aspects of rationality).

How so?

I think that ricraz claims that it’s impossible to create a mathematical theory of rationality or intelligence, and that this is a crux, not so? On the other hand, the “momentum vs. fitness” comparison doesn’t make sense to me.

Well, it’s not entirely clear. First there is the “realism” claim, which might even be taken in contrast to mathematical abstraction; EG, “is IQ real, or is it just a mathematical abstraction”? But then it is clarified with the momentum vs fitness test, which makes it seem like the question is the degree to which accurate mathematical models can be made (where “accurate” means, at least in part, helpfulness in making real predictions).

So the idea seems to be that there’s a spectrum with physics at one extreme end. I’m not quite sure what goes at the other extreme end. Here’s one possibility:

Physics

Chemistry

Biology

Psychology

Social Sciences

Humanities

A problem I have is that (almost) everything on the spectrum is real. Tables and chairs are real, despite not coming with precise mathematical models. So (arguably) one could draw two separate axes, “realness” vs “mathematical modelability”. Well, it’s not clear exactly what that second axis should be.

Anyway, to the extent that the question is about how mathematically modelable agency is, I do think it makes more sense to expect “reproductive fitness” levels rather than “momentum” levels.

Hmm, actually, I guess there’s a tricky interpretational issue here, which is what it means to model agency exactly.

On the one hand, I fully believe in Eliezer’s idea of understanding rationality so precisely that you could make it out of pasta and rubber bands (or whatever). IE, at some point we will be able to build agents from the ground up. This could be seen as an entirely precise mathematical model of rationality.

But the important thing is a theoretical understanding sufficient to understand the behavior of rational agents in the abstract, such that you could predict in broad strokes what an agent would do before building and running it. This is a very different matter.

I can see how Ricraz would read statements of the first type as suggesting very strong claims of the second type. I think models of the second type have to be significantly more approximate, however. EG, you cannot be sure of exactly what a learning system will learn in complex problems.

Yeah, I should have been much more careful before throwing around words like “real”. See the long comment I just posted for more clarification, and in particular this paragraph:

I’m not trying to argue that concepts which we can’t formalise “aren’t real”, but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can’t formalise, and that it’s those incoherent extrapolations which “aren’t real” (I agree that this was quite unclear in the original post).

It seems almost tautologically true that you can’t accurately predict what an agent will do without actually running the agent. Because, any algorithm that accurately predicts an agent can itself be regarded as an instance of the same agent.

What I expect the abstract theory of intelligence to do is something like producing a categorization of agents in terms of qualitative properties. Whether that’s closer to “momentum” or “fitness”, I’m not sure the question is even meaningful.

I think the closest analogy is: abstract theory of intelligence is to AI engineering as complexity theory is to algorithmic design. Knowing the complexity class of a problem doesn’t tell you the best practical way to solve it, but it does give you important hints. (For example, if the problem is of exponential time complexity then you can only expect to solve it either for small inputs or in some special cases, and average-case complexity tells you just whether these cases need to be very special or not. If the problem is in NC then you know that it’s possible to gain a lot from parallelization. If the problem is in NP then at least you can test solutions, et cetera.)

And also, abstract theory of alignment should be to AI safety as complexity theory is to cryptography. Once again, many practical considerations are not covered by the abstract theory, but the abstract theory does tell you what kind of guarantees you can expect and when. (For example, in cryptography we can (sort of) know that a certain protocol has theoretical guarantees, but there is engineering work finding a practical implementation and ensuring that the assumptions of the theory hold in the real system.)

It seems almost tautologically true that you can’t accurately predict what an agent will do without actually running the agent. Because, any algorithm that accurately predicts an agent can itself be regarded as an instance of the same agent.

That seems manifestly false. You can figure out whether an algorithm halts or not without being accidentally stuck in an infinite loop. You can look at the recursive Fibonacci algorithm and figure out what it would do without ever running it. So there is a clear distinction between analyzing an algorithm and executing it. If anything, one would know more about the agent by using the techniques from analysis of algorithms than the agent would ever know about themselves.

Of course you can predict some properties of what an agent will do. In particular, I hope that we will eventually have AGI algorithms that satisfy provable safety guarantees. But, you can’t make exact predictions. In fact, there probably is a mathematical law that limits how accurate predictions you can get.

An optimization algorithm is, by definition, something that transforms computational resources into utility. So, if your prediction is so close to the real output that it has similar utility, then it means the way you produced this prediction involved the same product of “optimization power per unit of resources” by “amount of resources invested” (roughly speaking, I don’t claim to already know the correct formalism for this). So you would need to either (i) run a similar algorithm with similar resources or (ii) run a dumber algorithm but with more resources or (iii) use less resources but an even smarter algorithm.

So, if you want to accurately predict the output of a powerful optimization algorithm, your prediction algorithm would usually have to be either a powerful optimization algorithm in itself (cases i and iii) or prohibitively costly to run (case ii). The exception is cases when the optimization problem is easy, so a dumb algorithm can solve it without much resources (or a human can figure out the answer by emself).

How so?

Well, it’s not entirely clear. First there is the “realism” claim, which might even be taken in contrast to mathematical abstraction; EG, “is IQ real, or is it just a mathematical abstraction”? But then it is clarified with the momentum vs fitness test, which makes it seem like the question is

the degree to whichaccurate mathematical models can be made (where “accurate” means, at least in part, helpfulness in making real predictions).So the idea seems to be that there’s a spectrum with physics at one extreme end. I’m not quite sure what goes at the other extreme end. Here’s one possibility:

Physics

Chemistry

Biology

Psychology

Social Sciences

Humanities

A problem I have is that (almost) everything on the spectrum is real. Tables and chairs are real, despite not coming with precise mathematical models. So (arguably) one could draw two separate axes, “realness” vs “mathematical modelability”. Well, it’s not clear

exactlywhat that second axis should be.Anyway, to the extent that the question

isabout how mathematically modelable agency is, I do think it makes more sense to expect “reproductive fitness” levels rather than “momentum” levels.Hmm, actually, I guess there’s a tricky interpretational issue here, which is what it means to model agency exactly.

On the one hand, I fully believe in Eliezer’s idea of understanding rationality so precisely that you could make it out of pasta and rubber bands (or whatever). IE, at some point we will be able to build agents from the ground up. This

could be seen asan entirely precise mathematical model of rationality.But the important thing is a

theoretical understanding sufficient to understand the behavior of rational agents in the abstract, such that you could predict in broad strokes what an agent would dobeforebuilding and running it. This is a very different matter.I can see how Ricraz would read statements of the first type as suggesting very strong claims of the second type. I think models of the second type have to be significantly more approximate, however. EG, you cannot be sure of exactly what a learning system will learn in complex problems.

Yeah, I should have been much more careful before throwing around words like “real”. See the long comment I just posted for more clarification, and in particular this paragraph:

It seems almost tautologically true that you can’t accurately predict what an agent will do without actually running the agent. Because, any algorithm that accurately predicts an agent can itself be regarded as an instance of the same agent.

What I expect the abstract theory of intelligence to do is something like producing a categorization of agents in terms of

qualitativeproperties. Whether that’s closer to “momentum” or “fitness”, I’m not sure the question is even meaningful.I think the closest analogy is: abstract theory of intelligence is to AI engineering as complexity theory is to algorithmic design. Knowing the complexity class of a problem doesn’t tell you the best practical way to solve it, but it does give you important hints. (For example, if the problem is of exponential time complexity then you can only expect to solve it either for small inputs or in some special cases, and average-case complexity tells you just whether these cases need to be very special or not. If the problem is in NC then you know that it’s possible to gain a lot from parallelization. If the problem is in NP then at least you can test solutions, et cetera.)

And also, abstract theory of

alignmentshould be to AI safety as complexity theory is to cryptography. Once again, many practical considerations are not covered by the abstract theory, but the abstract theory does tell you what kind of guarantees you can expect and when. (For example, in cryptography we can (sort of) know that a certain protocol has theoretical guarantees, but there is engineering work finding a practical implementation and ensuring that theassumptionsof the theory hold in the real system.)That seems manifestly false. You can figure out whether an algorithm halts or not without being accidentally stuck in an infinite loop. You can look at the recursive Fibonacci algorithm and figure out what it would do without ever running it. So there is a clear distinction between analyzing an algorithm and executing it. If anything, one would know more about the agent by using the techniques from analysis of algorithms than the agent would ever know about themselves.

Of course you can predict

someproperties of what an agent will do. In particular, I hope that we will eventually have AGI algorithms that satisfyprovable safety guarantees. But, you can’t makeexactpredictions. In fact, there probably is a mathematical law that limits how accurate predictions you can get.An optimization algorithm is, by definition, something that transforms computational resources into utility. So, if your prediction is so close to the real output that it has similar utility, then it means the way you produced this prediction involved the same product of “optimization power per unit of resources” by “amount of resources invested” (roughly speaking, I don’t claim to already know the correct formalism for this). So you would need to either (i) run a similar algorithm with similar resources or (ii) run a dumber algorithm but with more resources or (iii) use less resources but an even smarter algorithm.

So, if you want to accurately predict the output of a powerful optimization algorithm, your prediction algorithm would usually have to be either a powerful optimization algorithm in itself (cases i and iii) or prohibitively costly to run (case ii). The exception is cases when the optimization problem is easy, so a dumb algorithm can solve it without much resources (or a human can figure out the answer by emself).

In special cases, not in the general case.