Btw, as a meta-point, my understanding of your key claim is:
Sometimes, getting more of one necessarily means getting less of the other. Hence, the “paradox.”
My impression after reading your comment is that you’re actually saying that if you optimize for one, the other one might go down, which I certainly agree with, but for a much simpler reason: in general, if you make a change that isn’t targeted at a variable Y, then that change is roughly equally likely to increase or decrease Y.
When we describe the behavior of a system, we typically operate at varying levels of abstraction. I’m not making an argument about the fragility of the substrate that the system is on, but rather the fragility of the parts that we typically use to describe the system at an appropriate level of abstraction.
When we describe the functionality of an artificial neural network, we tend to speak about model weights and computational graphs, which do tolerate slight modifications. On the other hand, when we describe the functionality of A* search, we tend to speak about single lines of code that do stuff, which generally don’t tolerate slight modifications.
This seems to be a fact about which programming language you choose, as opposed to what algorithm you’re using. I could in theory implement A* in neural net weights by hand, and then it would be robust. Similarly, I could write out a learned neural net in Python (one line of Python for every flop in the model), and it would no longer be robust.
(I think more broadly the robustness you’re identifying involves making “small” changes where “small” is defined in terms of a “distance” defined by the programming language; I want a “distance” defined on algorithms, because that seems more relevant to talking about properties of AI. That distance should not depend on what programming language you use to implement the algorithm.)
Consider the scalar definition of robustness: how well did you do during your performance off some training distribution? In this case, many humans are brittle, since they are not doing well according to inclusive fitness. Even within their own lives, humans don’t pursue the goals they set for themselves 10 years ago. There’s a lot of ways in which humans are brittle in this sense.
I claim the neural net of your example wouldn’t be brittle in this way, since you postulated that it was trained on the actual distribution of environments it would encounter.
I’m not sure I understand the difference between a logical agent encountering a sudden, unpredictable change to its environment and a logical agent entering a regime where its operating assumptions turned out to be false.
A logical agent for solving mazes could be an agent that follows a wall. If you deploy such an agent in a maze with a loop, then it can circle forever. It seems like a type error to call this a sudden, unpredictable change—I wouldn’t really ascribe beliefs to this agent at all.
After thinking about your reply, I’ve come to the conclusion that my thoughts are currently too confused to continue explaining. I’ve edited the main post to add that detail.
Btw, as a meta-point, my understanding of your key claim is:
My impression after reading your comment is that you’re actually saying that if you optimize for one, the other one might go down, which I certainly agree with, but for a much simpler reason: in general, if you make a change that isn’t targeted at a variable Y, then that change is roughly equally likely to increase or decrease Y.
This seems to be a fact about which programming language you choose, as opposed to what algorithm you’re using. I could in theory implement A* in neural net weights by hand, and then it would be robust. Similarly, I could write out a learned neural net in Python (one line of Python for every flop in the model), and it would no longer be robust.
(I think more broadly the robustness you’re identifying involves making “small” changes where “small” is defined in terms of a “distance” defined by the programming language; I want a “distance” defined on algorithms, because that seems more relevant to talking about properties of AI. That distance should not depend on what programming language you use to implement the algorithm.)
I claim the neural net of your example wouldn’t be brittle in this way, since you postulated that it was trained on the actual distribution of environments it would encounter.
A logical agent for solving mazes could be an agent that follows a wall. If you deploy such an agent in a maze with a loop, then it can circle forever. It seems like a type error to call this a sudden, unpredictable change—I wouldn’t really ascribe beliefs to this agent at all.
After thinking about your reply, I’ve come to the conclusion that my thoughts are currently too confused to continue explaining. I’ve edited the main post to add that detail.