The claim “we’re confused about agency” is not a claim that our theory of agency is pre-paradigmatic. It’s a claim that the current paradigm (of expected utility maximization) contains a bunch of anomalies and limitations, suggesting that there’s a much better paradigm waiting to be found.
I think that EU maximization works quite well for ideal agents and I don’t expect a very clean theory for bounded agents. Flaws in the EU maximization paradigm do not necessarily suggest that there is a better paradigm waiting to be found, because unlike in physics, I see no a priori reason to expect an elegant grand unified theory for bounded rationality. There can simply be different paradigms for modeling different aspects of agency. Clearly, some general principles and constraints will apply, and some connections will be discoverable. But it is not clear that there is any end to this process, or that we would need to find it to solve AI safety in practice.
This is a bad model of how scientific progress works. Instead, people should focus on the clearest anomalies/limitations as “clues” to finding a new paradigm that will change many aspects of how we think about agency.
Analogously, imagine pre-Einstein physicists claiming “we’re confused about space and time”. People might have made arguments similar to yours in this post in response, saying things like “there are actually many physics equations which use the concepts of space and time to make accurate predictions” or “insofar as we’re confused, it is best to focus on the confusions that seem most relevant to the specific problems that face us”. But those kinds of arguments would have been mistaken.
This analogy seems a bit tortured and I don’t think the argument works as stated:
I struggle to parse the intended structure of the analogy. Einstein’s research direction, leading to special relativity, seems to have been inspired by the invariance of the speed of light as measured across inertial reference frames. That was a problem with the existing theory. While his answer threw out some pretty fundamental assumptions, it does seem naturally paired to the question, and I doubt anyone would have said that “insofar as we’re confused, it is best to focus on the confusions that seem most relevant to the specific problems that face us” in response to either the research process or the result. If a philosopher specializing in metaphysics had invented special relativity by reasoning from first principles about time, your argument would make more historical sense.
The example of Einstein is particularly available because he overthrew one elegant paradigm with another elegant paradigm, but I do not think that most scientific progress looks like this, at least outside of physics. (I think I’ve seen this line of argument taken a few times on lesswrong already)
More importantly, AI safety is fundamentally an engineering problem, not a scientific problem. As is typical in computer science, scientific and mathematical progress is likely necessary to reach a solution. However, there is still a specific goal. Usually, solving a specific technical problem (such as landing a person on the moon) looks more like overcoming barriers directly than noticing confusions / curiosity in hopes of inventing a new paradigm. So, in order for this analogy to support you position, you would need to explain why solving AI safety is more like inventing relativity than it is like achieving a moon landing?
Generally, I find this comment fairly unhelpful. I was already aware that you disagreed with the position in this post (along with many others in the agent foundations community, which is why I wrote it), as well as your specific reasons (rejecting Bayesianism / EU maximization). Beyond signaling your disagreement, I don’t think your comment engages with my argument, because both of your points seem to implicitly assume that I am wrong, rather than demonstrating that I am wrong.
both of your points seem to implicitly assume that I am wrong, rather than demonstrating that I am wrong.
You were portraying the “we’re confused about agency” position as being that agency is “pre-paradigmatic”. I think that’s a mischaracterization, and corrected it. This doesn’t implicitly assume that you’re wrong.
However, I accept that my claim “this is a bad model of how scientific progress works” does implicitly rely on the idea that there’s a clean new paradigm to be discovered for bounded agents. I didn’t have that cached in my head as the core crux between us, but I’d phrase it differently now that I do.
Flaws in the EU maximization paradigm do not necessarily suggest that there is a better paradigm waiting to be found, because unlike in physics, I see no a priori reason to expect an elegant grand unified theory for bounded rationality.
Disagreed. In my experience, flaws in [X] paradigm usually suggests there is a better paradigm [Y] waiting to be found. Correspondence principles wherein the domain of validity of [X] is subsumed by [Y] are not limited to physics. Specifically, a comprehensive theory of bounded agency (say, with bound b) should recover a theory of non-bounded agency in the limit , otherwise it’s probably incorrect!
On the object level, some of Einstein’s gedankenexperiments on clock synchronization are pretty hard to distinguish from “a philosopher specializing in metaphysics had invented special relativity by reasoning from first principles about time”.
More importantly, AI safety is fundamentally an engineering problem, not a scientific problem.
Disagreed. If it’s an engineering problem, then I would expect that we must already deeply understand all the relevant principles—so that it’s just a matter of exchanging cost vs. safety, i.e. how much do we want to overengineer this? That does not characterize my epistemic state on AI safety in the least. In the moon landing analogy, I think AI safety research is arguably about at the level of the Tsiolkovsky rocket equation.
Specifically, a comprehensive theory of bounded agency (say, with bound b) should recover a theory of non-bounded agency in the limit , otherwise it’s probably incorrect!
This argument does not prove your point. It proves that if there is a clean theory of bounded agency, it should limit to some theory of unbounded agency. It does not prove that there must be a clean theory of bounded agency.
On the object level, some of Einstein’s gedankenexperiments on clock synchronization are pretty hard to distinguish from “a philosopher specializing in metaphysics had invented special relativity by reasoning from first principles about time”.
Interesting, link? But I think I stand by my point that the analogy is tortured.
Disagreed. If it’s an engineering problem, then I would expect that we must already deeply understand all the relevant principles—so that it’s just a matter of exchanging cost vs. safety, i.e. how much do we want to overengineer this? That does not characterize my epistemic state on AI safety in the least. In the moon landing analogy, I think AI safety research is arguably about at the level of the Tsiolkovsky rocket equation.
I think engineering problems can have scientific subproblems, so I do not agree that we need to understand all of the relevant principles for AI safety to be an engineering problem (I guess I should be clear it’s not a “mere” engineering problem, in the sense of conceptually straight-forward implementation, but how many hard engineering problems are like that in general?). Rather, it is an engineering problem because the ultimate objective is to achieve a desired effect, not to understand something better.
Non sequitur? I did not expect that statement to prove “there is a clean theory of bounded agency.” It’s a terse illustration of why correspondence principles aren’t physics-magic. I am trying to explain why I believe we are disagreeing on priors here. It appeared to me that you believe elegant grand unified theories only appear in physics a priori.
I think that we are in the situation where we need to understand something (agency; metaethics) better to achieve the desired effect (safe superintelligence). I think the amount of scientific work to be done on the former far eclipses the engineering work of the latter, to the point that it’s disingenuous to call the scientific work a subproblem. So it is more precise to say that I agree with you on the factual matter, but am dismayed by your rhetorical framing.
Non sequitur? I did not expect that statement to prove “there is a clean theory of bounded agency.” It’s a terse illustration of why correspondence principles aren’t physics-magic. I am trying to explain why I believe we are disagreeing on priors here. It appeared to me that you believe elegant grand unified theories only appear in physics a priori.
If what I said was non-sequitur then I (still) do not understand what you are trying to say.
I think that EU maximization works quite well for ideal agents and I don’t expect a very clean theory for bounded agents. Flaws in the EU maximization paradigm do not necessarily suggest that there is a better paradigm waiting to be found, because unlike in physics, I see no a priori reason to expect an elegant grand unified theory for bounded rationality. There can simply be different paradigms for modeling different aspects of agency. Clearly, some general principles and constraints will apply, and some connections will be discoverable. But it is not clear that there is any end to this process, or that we would need to find it to solve AI safety in practice.
This analogy seems a bit tortured and I don’t think the argument works as stated:
I struggle to parse the intended structure of the analogy. Einstein’s research direction, leading to special relativity, seems to have been inspired by the invariance of the speed of light as measured across inertial reference frames. That was a problem with the existing theory. While his answer threw out some pretty fundamental assumptions, it does seem naturally paired to the question, and I doubt anyone would have said that “insofar as we’re confused, it is best to focus on the confusions that seem most relevant to the specific problems that face us” in response to either the research process or the result. If a philosopher specializing in metaphysics had invented special relativity by reasoning from first principles about time, your argument would make more historical sense.
The example of Einstein is particularly available because he overthrew one elegant paradigm with another elegant paradigm, but I do not think that most scientific progress looks like this, at least outside of physics. (I think I’ve seen this line of argument taken a few times on lesswrong already)
More importantly, AI safety is fundamentally an engineering problem, not a scientific problem. As is typical in computer science, scientific and mathematical progress is likely necessary to reach a solution. However, there is still a specific goal. Usually, solving a specific technical problem (such as landing a person on the moon) looks more like overcoming barriers directly than noticing confusions / curiosity in hopes of inventing a new paradigm. So, in order for this analogy to support you position, you would need to explain why solving AI safety is more like inventing relativity than it is like achieving a moon landing?
Generally, I find this comment fairly unhelpful. I was already aware that you disagreed with the position in this post (along with many others in the agent foundations community, which is why I wrote it), as well as your specific reasons (rejecting Bayesianism / EU maximization). Beyond signaling your disagreement, I don’t think your comment engages with my argument, because both of your points seem to implicitly assume that I am wrong, rather than demonstrating that I am wrong.
You were portraying the “we’re confused about agency” position as being that agency is “pre-paradigmatic”. I think that’s a mischaracterization, and corrected it. This doesn’t implicitly assume that you’re wrong.
However, I accept that my claim “this is a bad model of how scientific progress works” does implicitly rely on the idea that there’s a clean new paradigm to be discovered for bounded agents. I didn’t have that cached in my head as the core crux between us, but I’d phrase it differently now that I do.
Disagreed. In my experience, flaws in [X] paradigm usually suggests there is a better paradigm [Y] waiting to be found. Correspondence principles wherein the domain of validity of [X] is subsumed by [Y] are not limited to physics. Specifically, a comprehensive theory of bounded agency (say, with bound b) should recover a theory of non-bounded agency in the limit , otherwise it’s probably incorrect!
On the object level, some of Einstein’s gedankenexperiments on clock synchronization are pretty hard to distinguish from “a philosopher specializing in metaphysics had invented special relativity by reasoning from first principles about time”.
Disagreed. If it’s an engineering problem, then I would expect that we must already deeply understand all the relevant principles—so that it’s just a matter of exchanging cost vs. safety, i.e. how much do we want to overengineer this? That does not characterize my epistemic state on AI safety in the least. In the moon landing analogy, I think AI safety research is arguably about at the level of the Tsiolkovsky rocket equation.
This argument does not prove your point. It proves that if there is a clean theory of bounded agency, it should limit to some theory of unbounded agency. It does not prove that there must be a clean theory of bounded agency.
Interesting, link? But I think I stand by my point that the analogy is tortured.
I think engineering problems can have scientific subproblems, so I do not agree that we need to understand all of the relevant principles for AI safety to be an engineering problem (I guess I should be clear it’s not a “mere” engineering problem, in the sense of conceptually straight-forward implementation, but how many hard engineering problems are like that in general?). Rather, it is an engineering problem because the ultimate objective is to achieve a desired effect, not to understand something better.
Non sequitur? I did not expect that statement to prove “there is a clean theory of bounded agency.” It’s a terse illustration of why correspondence principles aren’t physics-magic. I am trying to explain why I believe we are disagreeing on priors here. It appeared to me that you believe elegant grand unified theories only appear in physics a priori.
https://www.fourmilab.ch/etexts/einstein/specrel/www/§ and § .
Namely
I think that we are in the situation where we need to understand something (agency; metaethics) better to achieve the desired effect (safe superintelligence). I think the amount of scientific work to be done on the former far eclipses the engineering work of the latter, to the point that it’s disingenuous to call the scientific work a subproblem. So it is more precise to say that I agree with you on the factual matter, but am dismayed by your rhetorical framing.
If what I said was non-sequitur then I (still) do not understand what you are trying to say.
My apologies then, I don’t know how to (compactly) improve your understanding other than to point at my priors more vigorously.
[Bowing out.]