Wei Dai thinks that automating philosophy is among the hardest problems in AI safety.[1] If he’s right, we might face a period where we have superhuman scientific and technological progress without comparable philosophical progress. This could be dangerous: imagine humanity with the science and technology of 1960 but the philosophy of 1460!
I think the likelihood of philosophy ‘keeping pace’ with science/technology depends on two factors:
How similar are the capabilities required? If philosophy requires fundamentally different methods than science and technology, we might automate one without the other.
What are the incentives? I think the direct economic incentives to automating science and technology are stronger than automating philosophy. That said, there might be indirect incentives to automate philosophy if philosophical progress becomes a bottleneck to scientific or technological progress.
I’ll consider only the first factor here: How similar are the capabilities required?
Wei Dai is a metaphilosophical exceptionalist. He writes:
We seem to understand the philosophy/epistemology of science much better than that of philosophy (i.e. metaphilosophy), and at least superficially the methods humans use to make progress in them don’t look very similar, so it seems suspicious that the same AI-based methods happen to work equally well for science and for philosophy.
I will contrast Wei Dai’s position with that of Timothy Williamson, a metaphilosophical anti-exceptionalist.
These are the claims that constitute Williamson’s view:
Philosophy is a science.
It’s not a natural science (like particle physics, organic chemistry, nephrology), but not all sciences are natural sciences — for instance, mathematics and computer science are formal sciences. Philosophy is likewise a non-natural science.
Although philosophy differs from other scientific inquiries, it differs no more in kind or degree than they differ from each other. Put provocatively, theoretical physics might be closer to analytic philosophy than to experimental physics.
Philosophy, like other sciences, pursues knowledge. Just as mathematics peruses mathematical knowledge, and nephrology peruses nephrological knowledge, philosophy pursues philosophical knowledge.
Different sciences will vary in their subject-matter, methods, practices, etc., but philosophy doesn’t differ to a far greater degree or in a fundamentally different way. (6) Philosophical methods (i.e. the ways in which philosophy achieves its aim, knowledge) aren’t starkly different from the methods of other sciences.
Philosophy isn’t a science in a parasitic sense. It’s not a science because it uses scientific evidence or because it has applications for the sciences. Rather, it’s simply another science, not uniquely special. Williamson says, “philosophy is neither queen nor handmaid of the sciences, just one more science with a distinctive character, just as other sciences have distinctive character.”
Philosophy is not, exceptionally among sciences, concerned with words or concepts. This conflicts with many 20th century philosophers who conceived philosophy as chiefly concerned with linguistic or conceptual analysis, such as Wittgenstein, Carnap.
Philosophy doesn’t consist of a series of disconnected visionaries. Rather, it consists in the incremental contribution of thousands of researchers: some great, some mediocre, much like any other scientific inquiry.
Roughly speaking, metaphilosophical exceptionalism should make one more pessimistic about philosophical progress keeping pace with scientific and technological progress. I lean towards Williamson’s position, which makes me less pessimistic about philosophy keeping pace by default.
That said, during a rapid takeoff, even small differences in the pace could lead to a growing gap between philosophical progress and scientific/technological progress. So I consider automating philosophy an important problem to work on.
I think you could approximately define philosophy as “the set of problems that are left over after you take all the problems that can be formally studied using known methods and put them into their own fields.” Once a problem becomes well-understood, it ceases to be considered philosophy. For example, logic, physics, and (more recently) neuroscience used to be philosophy, but now they’re not, because we know how to formally study them.
So I believe Wei Dai is right that philosophy is exceptionally difficult—and this is true almost by definition, because if we know how to make progress on a problem, then we don’t call it “philosophy”.
For example, I don’t think it makes sense to say that philosophy of science is a type of science, because it exists outside of science. Philosophy of science is about laying the foundations of science, and you can’t do that using science itself.
I think the most important philosophical problems with respect to AI are ethics and metaethics because those are essential for deciding what an ASI should do, but I don’t think we have a good enough understanding of ethics/metaethics to know how to get meaningful work on them out of AI assistants.
One route here is just taboo Philosophy, and say “we’re talking about ‘reasoning about the stuff we haven’t formalized yet’”, and then it doesn’t matter whether or not there’s a formalization of what most people call “philosophy.” (actually: I notice I’m not sure if the thing-that-is “solve unformalized stuff” is “philosophy” or “metaphilosophy”)
But, if we’re evaluating whether “we need to solve metaphilosophy” (and this is a particular bottleneck for AI going well), I think we need to get a bit more specific about what cognitive labor needs to happen. It might turn out to be that all the individual bits here are reasonably captured by some particular subfields, which might or might not be “formalized.”
I would personally say “until you’ve figured out how to confidently navigate stuff that’s pre-formalized, something as powerful AI is likely to make something go wrong, and you should be scared about that”. But, I’d be a lot less confident to say the more specific sentences “you need solved metaphilosophy to align successor AIs”, or most instances of “solve ethics.”
I might say “you need to have solved metaphilosophy to do a Long Reflection”, since, sort of by definition doing a Long Reflection is “figuring everything out”, and if you’re about to do that and then Tile The Universe With Shit you really want to make sure there was nothing you failed to figure out because you weren’t good enough at metaphilosophy.
To try to explain how I see the difference between philosophy and metaphilosophy:
My definition of philosophy is similar to @MichaelDickens’ but I would use “have serviceable explicitly understood methods” instead of “formally studied” or “formalized” to define what isn’t philosophy, as the latter might be or could be interpreted as being too high of a bar, e.g., in the sense of formal systems.
So in my view, philosophy is directly working on various confusing problems (such as “what is the right decision theory”) using whatever poorly understood methods that we have or can implicitly apply, and then metaphilosophy is trying to help solve these problems on a meta level, by better understanding the nature of philosophy, for example:
Try to find if there is some unifying quality that ties all of these “philosophical” problems together (besides “lack of serviceable explicitly understood methods”).
Try to formalize some part of philosophy, or find explicitly understood methods for solving certain philosophical problems.
Try to formalize all of philosophy wholesale, or explicitly understand what is it that humans are doing (or should be doing, or what AIs should be doing) when it comes to solving problems in general. This may not be possible, i.e., maybe there is no such general method that lets us solve every problem given enough time and resources, but it sure seems like humans have some kind of general purpose (but poorly understood) method, that lets us make progress slowly over time on a wide variety of problems, including ones that are initially very confusing, or hard to understand/explain what we’re even asking, etc. We can at least aim to understand what is it that humans are or have been doing, even if it’s not a fully general method.
I’m curious what you say about “which are the specific problems (if any) where you specifically think ‘we really need to have solved philosophy / improved-a-lot-at-metaphilosophy’ to have a decent shot at solving this?’”
(as opposed to, well, generally it sounds good to be good at solving confusing problems, and we do expect to have some confusing problems to solve, but, like, we might pretty quickly figure out ‘oh, the problem is actually shaped like <some paradigmatic system>’ and then deal with it?)
I’m curious what you say about “which are the specific problems (if any) where you specifically think ‘we really need to have solved philosophy / improved-a-lot-at-metaphilosophy’ to have a decent shot at solving this?’”
Assuming by “solving this” you mean solving AI x-safety or navigating the AI transition well, I just post a draft about this. Or if you already read that and are asking for an even more concrete example, a scenario I often think about is an otherwise aligned ASI, some time into the AI transition when things are moving very fast (from a human perspective) and many highly consequential decisions need to be made (e.g., what alliances to join, how to bargain with others, how to self-modify or take advantage of the latest AI advances, how to think about AI welfare and other near-term ethical issues, what to do about commitment races and threats, how to protect the user against manipulation or value drift, whether to satisfy some user request that might be harmful according to their real values) that often involve philosophical problems. And they can’t just ask their user (or alignment target) or even predict “what would the user say if they thought about this for a long time” because the user themselves may not be philosophically very competent and/or making such predictions with high accuracy (over a long enough time frame) is still outside their range of capabilities.
So the specific problem is how to make sure this AI doesn’t make wrong decisions that cause a lot of waste or harm, that quickly or over time cause most of the potential value of the universe to be lost, which in turn seems to involve figuring out how the AI should be thinking about philosophical problems, or how to make the AI philosophically competent even if their alignment target isn’t.
Does this help / is this the kind of answer you’re asking for?
One way to see that philosophy is exceptional is that we have serviceable explicit understandings of math and natural science, even formalizations in the forms of axiomatic set theory and Solomonoff Induction, but nothing comparable in the case of philosophy. (Those formalizations are far from ideal or complete, but still represent a much higher level of understanding than for philosophy.)
If you say that philosophy is a (non-natural) science, then I challenge you, come up with something like Solomonoff Induction, but for philosophy.
Philosophy is where we keep all the questions we don’t know how to answer. With most other sciences, we have a known culture of methods for answering questions in that field. Mathematics has the method of definition, theorem and proof. Nephrology has the methods of looking at sick people with kidney problems, experimenting on rat kidneys, and doing chemical analyses of cadaver kidneys. Philosophy doesn’t have a method that lets you grind out an answer. Philosophy’s methods of thinking hard, drawing fine distinctions, writing closely argued articles, and public dialogue, don’t converge on truth as well as in other sciences. But they’re the best we’ve got, so we just have to keep on trying.
When we find some new methods of answering philosophical questions, the result tends to be that such questions tend to move out of philosophy into another (possibly new) field. Presumably this will also occur if AI gives us the answers to some philosophical questions, and we can be convinced of those answers.
An AI answer to a philosophical question has a possible problem we haven’t had to face before: what if we’re too dumb to understand it? I don’t understand Grothedieck’s work in algebraic geometry, or Richard Feynman on quantum field theory, but I am assured by those who do understand such things that this work is correct and wonderful. I’ve bounced off both these fields pretty hard when I try to understand them. I’ve come to the conclusion that I’m just not smart enough. What if AI comes up with a conclusion for which even the smartest human can’t understand the arguments or experiments or whatever new method the AI developed? If other AIs agree with the conclusion, I think we will have no choice but to go along. But that marks the end of philosophy as a human activity.
An AI answer to a philosophical question has a possible problem we haven’t had to face before: what if we’re too dumb to understand it? [...] What if AI comes up with a conclusion for which even the smartest human can’t understand the arguments or experiments or whatever new method the AI developed? If other AIs agree with the conclusion, I think we will have no choice but to go along. But that marks the end of philosophy as a human activity.
One caveat here is that regardless of the field, verifying that an answer is correct should be far easier than coming up with that correct answer, so in principle that still leaves a lot of room for human-understandable progress by AIs in pretty much all fields. It doesn’t necessarily leave a lot of time, though, if that kind of progress requires a superhuman AI in the first place.
There are many questions where verification is no easier than generation, e.g. “Is this chess move best?” is no easier than “What’s the best chess move?” Both are EXPTIME-complete.
Philosophy might have a similar complexity to ’What’s the best chess move?”, i.e. “What argument X is such that for all counterarguments X1 there exists a countercounterargument X2 such that for all countercountercounterarguments X3...”, i.e. you explore the game tree of philosophical discourse.
I’m not convinced by this response (incidentally here I’ve found a LW post making a similar claim). If your only justification for “is move X best” is “because I’ve tried all others”, that doesn’t exactly seem like usefully accumulated knowledge. You can’t generalize from it, for one thing.
And for philosophy, if we’re still only on the level of endless arguments and counterarguments, that doesn’t seem like useful philosophical progress at all, certainly not something a human or AI should use as a basis for further deductions or decisions.
What’s an example of useful existing knowledge we’ve accumulated that we can’t in retrospect verify far more easily than we acquired it?
Williamson seems to be making a semantic argument rather than arguing anything concrete. Or at least, the 6 claims he’s making seem to all be restatements of “philosophy is a science” without ever actually arguing why “a science” makes philosophy equivalently easy than other things labeled “a science”. For example, I can replace “philosophy” in your list of claims with “religion”, with the only claim that seems iffy being 5
Religion is a science.
It’s not a natural science (like particle physics, organic chemistry, nephrology), but not all sciences are natural sciences — for instance, mathematics and computer science are formal sciences. Religion is likewise a non-natural science.
Although astrology differs from other scientific inquiries, it differs no more in kind or degree than they differ from each other. Put provocatively, theoretical physics might be closer to religion than to experimental physics.
Religion, like other sciences, pursues knowledge. Just as mathematics peruses mathematical knowledge, and nephrology peruses nephrological knowledge, religion pursues religious knowledge.
Different sciences will vary in their subject-matter, methods, practices, etc., but religion doesn’t differ to a far greater degree or in a fundamentally different way. (6) Religious methods (i.e. the ways in which religion achieves its aim, knowledge) aren’t starkly different from the methods of other sciences.
Religion isn’t a science in a parasitic sense. It’s not a science because it uses scientific evidence or because it has applications for the sciences. Rather, it’s simply another science, not uniquely special. Shmilliamson says, “Religion is neither queen nor handmaid of the sciences, just one more science with a distinctive character, just as other sciences have distinctive character.”
Religion is not, exceptionally among sciences, concerned with words or concepts. This conflicts with many religious thinkers who conceived religion as chiefly concerned with linguistic or conceptual analysis, such as Maimonides, or Thomas Aquinas.
Religion doesn’t consist of a series of disconnected visionaries. Rather, it consists in the incremental contribution of thousands of researchers: some great, some mediocre, much like any other scientific inquiry.
But of course, this claim is iffy for philosophy too. In what sense is philosophical knowledge not “starkly different from the methods of other sciences”? A key component of science is experiment, and in that sense, religion is much more science-like than philosophy! Eg see the ideas of personal experimentation in buddhism, and mormon epistemology (ask Claude about the significance of Alma 32 in mormon epistemology).
I’m not saying religion is a science, or that it is more right than philosophy, just that your representation of Williamson here doesn’t seem much more than a semantic dispute.
In particular, the real question here is whether the mechanisms we expect to automate science and math will also automate philosophy, not whether we ought to semantically group philosophy as a science. The reason we expect science and math to get automated is the existence of relatively concrete & well defined feedback loops between actions and results. Or at minimum, much more concrete feedback loops than philosophy has, and especially the philosophy Wei Dai typically cares about has (eg moral philosophy, decision theory, and metaphysics).
Concretely, if AIs decide that it is a moral good to spread the good word of spiralism, there’s nothing (save humans, but that will go away once we’re powerless) to stop them, but if they decide quantum mechanics is fake, or 2+2=5, well… they won’t make it too far.
I’d guess this is also why Wei Dai believes in “philosophical exceptionalism”. Regardless of whether you want to categorize philosophy as a science or not, the above paragraph applies just as well to groups of humans as to AIs. Indeed, there have been much much more evil & philosophically wrong ideologies than spiralism in the past.
Whether experiments serve as a distinction between science and philosophy, TW has a lecture arguing against this, and he addresses this in a bunch of papers. I’ll summarise his arguments later if I have time.
To clarify, I listed some of Williamson’s claims, but I haven’t summarised any of his arguments.
His actual arguments tend to be ‘negative’, i.e. they goes through many distinctions that metaphilosophical anti-exceptionalists purport, and for each he argues that either (i) the purported distinction is insubstantial,[1] or (ii) the distinction mischaracterised philosophy or science or both.[2]
He hasn’t I think addressed Wei Dai’s exceptionalism, which is (I gather) something like “Solomonoff induction provides a half-way decent formalisms of ideal maths/science, but there isn’t a similarly decent formalism of ideal philosophy.”
I’ll think a bit more about what Williamson might say about that Wei Dai’s purported distinction. I think Williamson is open to the possibility that philosophy is qualitatively different from science, so it’s possible he would change his mind if he engaged with Dai’s position.
E.g., one purported distinction he critiques is that philosophy is concerned with words/concepts in a qualitatively different way than the natural sciences.
To clarify, I listed some of Williamson’s claims, but I haven’t summarised any of his arguments.
I think even still, if these are the claims he’s making, none of them seem particularly relevant to the question of “whether the mechanisms we expect to automate science and math will also automate philosophy”.
My own take on philosophy is that it’s basically divided into 3 segments:
The philosophical problems that were solved, but the solutions are unsatisfying, so philosophers try to futilely make progress on the problem, whereas other scientists content themselves with less general solutions that evade the impossibilities.
(An example is how many philosophical problems basically reduce to the question of “does there exist a way to have a prior that is always better than any other prior for a set of data without memorizing all of the data”, and the answer is no in general, because of the No Free Lunch theorem, and an example of the problem solved is the Problem of Induction, but that matters less than people think because our world doesn’t satisfy the property of what’s required to generate a No Free Lunch result, and ML/AI is focused on solving specific problems in our universe).
2. The philosophical problem depends on definitions in an essential way, such that solving the problem amounts to disambiguating the definition, and there is no objective choice. (Example: Any discussion of what art is, and more generally any discussion of what X is potentially vulnerable to this sort of issue).
3. Philosophical problems that are solved, where the solutions aren’t unsatisfying to us (A random example is Ayer’s Puzzle of why would you collect any new data if you want to find the true hypothesis, solved by Mark Sellke).
A potential crux with Raemon/Wei Dai here is that I think that lots of philosophical problems are impossible to solve in a satisfying/fully general way, and that this matters a lot less to me than to a lot of LWers.
Another potential crux is that I don’t think preference aggregation/CEV can actually work without a preference prior/base values that must be arbitrarily selected, and thus politics is inevitably going to be in the preference aggregation (This comes from Steven Byrnes here):
I’m concerned that CEV isn’t well-defined. Or more specifically, that you could list numerous equally-a-priori-plausible detailed operationalizations of CEV, and they would give importantly different results, in a way that we would find very unsatisfying.
Relatedly, I’m concerned that a “Long Reflection” wouldn’t resolve all the important things we want it to resolve, or else resolve them in a way that is inextricably contingent on details of the Long Reflection governance / discourse rules, with no obvious way to decide which of numerous plausible governance / discourse rules are “correct”.
When people make statements that implicitly treat “the value of the future” as being well-defined, e.g. statements like “I define ‘strong utopia’ as: at least 95% of the future’s potential value is realized”, I’m concerned that these statements are less meaningful than they sound.
I’m concerned that changes in human values over the generations are at some deep level more like a random walk than progress-through-time, and that they only feel like progress-through-time because we’re “painting the target around the arrow”. So when we say “Eternal value lock-in is bad—we want to give our descendants room for moral growth!”, and we also simultaneously say specific things like “We want a future with lots of friendship and play and sense-of-agency and exploration, and very little pain and suffering, and…!”, then I’m concerned that those two statements are at least a little bit at odds, and maybe strongly at odds. (If it turns out that we have to pick just one of those two statements, I don’t know which one I’d vote for.)
On the philosophical problems posed by Wei Dai, here’s what I’d say:
Decision theory for AI / AI designers
How to resolve standard debates in decision theory?
Logical counterfactuals
Open source game theory
Acausal game theory / reasoning about distant superintelligences
All of these problems are problems where it isn’t worth it for humanity to focus on the problems, and instead delegate them to aligned AIs, with a few caveats (I’ll also say that there doesn’t exist a single decision theory that outperforms every other decision theory, links here and here (though there is a comment that I do like here))
Infinite/multiversal/astronomical ethics
Should we (or our AI) care much more about a universe that is capable of doing a lot more computations?
What kinds of (e.g. spatial-temporal) discounting is necessary and/or desirable?
This is very much dependent on the utility function/values, so this needs more assumptions in order to even have a solution.
Fair distribution of benefits
How should benefits from AGI be distributed?
For example, would it be fair to distribute it equally over all humans who currently exist, or according to how much AI services they can afford to buy?
What about people who existed or will exist at other times and in other places or universes?
Again, this needs assumptions over the utility function/fairness metric in order to even have a solution.
Need for “metaphilosophical paternalism”?
However we distribute the benefits, if we let the beneficiaries decide what to do with their windfall using their own philosophical faculties, is that likely to lead to a good outcome?
Again, entirely dependent on the utility functions.
Metaphilosophy
What is the nature of philosophy?
What constitutes correct philosophical reasoning?
How to specify this into an AI design?
I basically agree with Connor Leahy that the definition of metaphilosophy/philosophy is so large as to contain everything, and thus this is an ask for us to be able to solve every problem, so in that respect the No Free Lunch theorem tells us that we have to in general have every possible example memorized in training, and since this is not possible for us, we can immediately say that there is no generally correct philosophical reasoning that can be specified into an AI design, but in my view this matters a lot less than people think it does.
Philosophical forecasting
How are various AI technologies and AI safety proposals likely to affect future philosophical progress (relative to other kinds of progress)?
Depends, but in general the better AI is at hard to verify tasks, the better it’s philosophy is.
Preference aggregation between AIs and between users
How should two AIs that want to merge with each other aggregate their preferences?
How should an AI aggregate preferences between its users?
Do we need to make sure an AGI has a sufficient understanding of this?
The first question is a maybe interesting research question, but I don’t think we need AGI to understand/have normativity.
Metaethical policing
What are the implicit metaethical assumptions in a given AI alignment proposal (in case the authors didn’t spell them out)?
What are the implications of an AI design or alignment proposal under different metaethical assumptions?
Encouraging designs that make minimal metaethical assumptions or is likely to lead to good outcomes regardless of which metaethical theory turns out to be true.
For the first question, most alignment plans have the implicit meta-ethical assumption of moral relativism, which is that there’s no fundamentally objective values, and every value is valid, we just have to take the values of a human as given, as well as utility functions being a valid representative of human value, in that we can reduce what humans value into a utility function, but this is always correct, so it doesn’t matter.
Moral relativism is in a sense the most minimal metaethical assumption you can make, as it is entirely silent on what moral views are correct.
And that’s my answer to all of the questions from this post.
Williamson and Dai both appear to describe philosophy as a general-theoretical-model-building activity, but there are other conceptions of what it means to do philosophy. In contrast to both Williamson and Dai, if Wittgenstein (either early or late period) is right that the proper role of philosophy is to clarify and critique language rather than to construct general theses and explanations, LLM-based AI may be quickly approaching peak-human competence at philosophy. Critiquing and clarifying writing are already tasks that LLMs are good at and widely used for. They’re tasks that AI systems improve at from the types of scaling-up that labs are already doing, and labs have strong incentives to keep making their AIs better at them. As such, I’m optimistic about the philosophical competence of future AIs, but according to a different idea of what it means to be philosophically competent. AI systems that reach peak-human or superhuman levels of competence at Wittgensteinian philosophy-as-an-activity would be systems that help people become wiser on an individual level by clearing up their conceptual confusions, rather than a tool for coming up with abstract solutions to grand Philosophical Problems.
How Exceptional is Philosophy?
Wei Dai thinks that automating philosophy is among the hardest problems in AI safety.[1] If he’s right, we might face a period where we have superhuman scientific and technological progress without comparable philosophical progress. This could be dangerous: imagine humanity with the science and technology of 1960 but the philosophy of 1460!
I think the likelihood of philosophy ‘keeping pace’ with science/technology depends on two factors:
How similar are the capabilities required? If philosophy requires fundamentally different methods than science and technology, we might automate one without the other.
What are the incentives? I think the direct economic incentives to automating science and technology are stronger than automating philosophy. That said, there might be indirect incentives to automate philosophy if philosophical progress becomes a bottleneck to scientific or technological progress.
I’ll consider only the first factor here: How similar are the capabilities required?
Wei Dai is a metaphilosophical exceptionalist. He writes:
I will contrast Wei Dai’s position with that of Timothy Williamson, a metaphilosophical anti-exceptionalist.
These are the claims that constitute Williamson’s view:
Philosophy is a science.
It’s not a natural science (like particle physics, organic chemistry, nephrology), but not all sciences are natural sciences — for instance, mathematics and computer science are formal sciences. Philosophy is likewise a non-natural science.
Although philosophy differs from other scientific inquiries, it differs no more in kind or degree than they differ from each other. Put provocatively, theoretical physics might be closer to analytic philosophy than to experimental physics.
Philosophy, like other sciences, pursues knowledge. Just as mathematics peruses mathematical knowledge, and nephrology peruses nephrological knowledge, philosophy pursues philosophical knowledge.
Different sciences will vary in their subject-matter, methods, practices, etc., but philosophy doesn’t differ to a far greater degree or in a fundamentally different way. (6) Philosophical methods (i.e. the ways in which philosophy achieves its aim, knowledge) aren’t starkly different from the methods of other sciences.
Philosophy isn’t a science in a parasitic sense. It’s not a science because it uses scientific evidence or because it has applications for the sciences. Rather, it’s simply another science, not uniquely special. Williamson says, “philosophy is neither queen nor handmaid of the sciences, just one more science with a distinctive character, just as other sciences have distinctive character.”
Philosophy is not, exceptionally among sciences, concerned with words or concepts. This conflicts with many 20th century philosophers who conceived philosophy as chiefly concerned with linguistic or conceptual analysis, such as Wittgenstein, Carnap.
Philosophy doesn’t consist of a series of disconnected visionaries. Rather, it consists in the incremental contribution of thousands of researchers: some great, some mediocre, much like any other scientific inquiry.
Roughly speaking, metaphilosophical exceptionalism should make one more pessimistic about philosophical progress keeping pace with scientific and technological progress. I lean towards Williamson’s position, which makes me less pessimistic about philosophy keeping pace by default.
That said, during a rapid takeoff, even small differences in the pace could lead to a growing gap between philosophical progress and scientific/technological progress. So I consider automating philosophy an important problem to work on.
See AI doing philosophy = AI generating hands? (Jan 2024), Meta Questions about Metaphilosophy (Sep 2023), Morality is Scary (Dec 2021), Problems in AI Alignment that philosophers could potentially contribute to (Aug 2019), On the purposes of decision theory research (Jul 2019), Some Thoughts on Metaphilosophy (Feb 2019), The Argument from Philosophical Difficulty (Feb 2019), Two Neglected Problems in Human-AI Safety (Dec 2018), Metaphilosophical Mysteries (2010)
I think you could approximately define philosophy as “the set of problems that are left over after you take all the problems that can be formally studied using known methods and put them into their own fields.” Once a problem becomes well-understood, it ceases to be considered philosophy. For example, logic, physics, and (more recently) neuroscience used to be philosophy, but now they’re not, because we know how to formally study them.
So I believe Wei Dai is right that philosophy is exceptionally difficult—and this is true almost by definition, because if we know how to make progress on a problem, then we don’t call it “philosophy”.
For example, I don’t think it makes sense to say that philosophy of science is a type of science, because it exists outside of science. Philosophy of science is about laying the foundations of science, and you can’t do that using science itself.
I think the most important philosophical problems with respect to AI are ethics and metaethics because those are essential for deciding what an ASI should do, but I don’t think we have a good enough understanding of ethics/metaethics to know how to get meaningful work on them out of AI assistants.
Hmm, this makes me think:
One route here is just taboo Philosophy, and say “we’re talking about ‘reasoning about the stuff we haven’t formalized yet’”, and then it doesn’t matter whether or not there’s a formalization of what most people call “philosophy.” (actually: I notice I’m not sure if the thing-that-is “solve unformalized stuff” is “philosophy” or “metaphilosophy”)
But, if we’re evaluating whether “we need to solve metaphilosophy” (and this is a particular bottleneck for AI going well), I think we need to get a bit more specific about what cognitive labor needs to happen. It might turn out to be that all the individual bits here are reasonably captured by some particular subfields, which might or might not be “formalized.”
I would personally say “until you’ve figured out how to confidently navigate stuff that’s pre-formalized, something as powerful AI is likely to make something go wrong, and you should be scared about that”. But, I’d be a lot less confident to say the more specific sentences “you need solved metaphilosophy to align successor AIs”, or most instances of “solve ethics.”
I might say “you need to have solved metaphilosophy to do a Long Reflection”, since, sort of by definition doing a Long Reflection is “figuring everything out”, and if you’re about to do that and then Tile The Universe With Shit you really want to make sure there was nothing you failed to figure out because you weren’t good enough at metaphilosophy.
To try to explain how I see the difference between philosophy and metaphilosophy:
My definition of philosophy is similar to @MichaelDickens’ but I would use “have serviceable explicitly understood methods” instead of “formally studied” or “formalized” to define what isn’t philosophy, as the latter might be or could be interpreted as being too high of a bar, e.g., in the sense of formal systems.
So in my view, philosophy is directly working on various confusing problems (such as “what is the right decision theory”) using whatever poorly understood methods that we have or can implicitly apply, and then metaphilosophy is trying to help solve these problems on a meta level, by better understanding the nature of philosophy, for example:
Try to find if there is some unifying quality that ties all of these “philosophical” problems together (besides “lack of serviceable explicitly understood methods”).
Try to formalize some part of philosophy, or find explicitly understood methods for solving certain philosophical problems.
Try to formalize all of philosophy wholesale, or explicitly understand what is it that humans are doing (or should be doing, or what AIs should be doing) when it comes to solving problems in general. This may not be possible, i.e., maybe there is no such general method that lets us solve every problem given enough time and resources, but it sure seems like humans have some kind of general purpose (but poorly understood) method, that lets us make progress slowly over time on a wide variety of problems, including ones that are initially very confusing, or hard to understand/explain what we’re even asking, etc. We can at least aim to understand what is it that humans are or have been doing, even if it’s not a fully general method.
Does this make sense?
Yeah that all makes sense.
I’m curious what you say about “which are the specific problems (if any) where you specifically think ‘we really need to have solved philosophy / improved-a-lot-at-metaphilosophy’ to have a decent shot at solving this?’”
(as opposed to, well, generally it sounds good to be good at solving confusing problems, and we do expect to have some confusing problems to solve, but, like, we might pretty quickly figure out ‘oh, the problem is actually shaped like <some paradigmatic system>’ and then deal with it?)
Assuming by “solving this” you mean solving AI x-safety or navigating the AI transition well, I just post a draft about this. Or if you already read that and are asking for an even more concrete example, a scenario I often think about is an otherwise aligned ASI, some time into the AI transition when things are moving very fast (from a human perspective) and many highly consequential decisions need to be made (e.g., what alliances to join, how to bargain with others, how to self-modify or take advantage of the latest AI advances, how to think about AI welfare and other near-term ethical issues, what to do about commitment races and threats, how to protect the user against manipulation or value drift, whether to satisfy some user request that might be harmful according to their real values) that often involve philosophical problems. And they can’t just ask their user (or alignment target) or even predict “what would the user say if they thought about this for a long time” because the user themselves may not be philosophically very competent and/or making such predictions with high accuracy (over a long enough time frame) is still outside their range of capabilities.
So the specific problem is how to make sure this AI doesn’t make wrong decisions that cause a lot of waste or harm, that quickly or over time cause most of the potential value of the universe to be lost, which in turn seems to involve figuring out how the AI should be thinking about philosophical problems, or how to make the AI philosophically competent even if their alignment target isn’t.
Does this help / is this the kind of answer you’re asking for?
One way to see that philosophy is exceptional is that we have serviceable explicit understandings of math and natural science, even formalizations in the forms of axiomatic set theory and Solomonoff Induction, but nothing comparable in the case of philosophy. (Those formalizations are far from ideal or complete, but still represent a much higher level of understanding than for philosophy.)
If you say that philosophy is a (non-natural) science, then I challenge you, come up with something like Solomonoff Induction, but for philosophy.
Philosophy is where we keep all the questions we don’t know how to answer. With most other sciences, we have a known culture of methods for answering questions in that field. Mathematics has the method of definition, theorem and proof. Nephrology has the methods of looking at sick people with kidney problems, experimenting on rat kidneys, and doing chemical analyses of cadaver kidneys. Philosophy doesn’t have a method that lets you grind out an answer. Philosophy’s methods of thinking hard, drawing fine distinctions, writing closely argued articles, and public dialogue, don’t converge on truth as well as in other sciences. But they’re the best we’ve got, so we just have to keep on trying.
When we find some new methods of answering philosophical questions, the result tends to be that such questions tend to move out of philosophy into another (possibly new) field. Presumably this will also occur if AI gives us the answers to some philosophical questions, and we can be convinced of those answers.
An AI answer to a philosophical question has a possible problem we haven’t had to face before: what if we’re too dumb to understand it? I don’t understand Grothedieck’s work in algebraic geometry, or Richard Feynman on quantum field theory, but I am assured by those who do understand such things that this work is correct and wonderful. I’ve bounced off both these fields pretty hard when I try to understand them. I’ve come to the conclusion that I’m just not smart enough. What if AI comes up with a conclusion for which even the smartest human can’t understand the arguments or experiments or whatever new method the AI developed? If other AIs agree with the conclusion, I think we will have no choice but to go along. But that marks the end of philosophy as a human activity.
One caveat here is that regardless of the field, verifying that an answer is correct should be far easier than coming up with that correct answer, so in principle that still leaves a lot of room for human-understandable progress by AIs in pretty much all fields. It doesn’t necessarily leave a lot of time, though, if that kind of progress requires a superhuman AI in the first place.
There are many questions where verification is no easier than generation, e.g. “Is this chess move best?” is no easier than “What’s the best chess move?” Both are EXPTIME-complete.
Philosophy might have a similar complexity to ’What’s the best chess move?”, i.e. “What argument X is such that for all counterarguments X1 there exists a countercounterargument X2 such that for all countercountercounterarguments X3...”, i.e. you explore the game tree of philosophical discourse.
I’m not convinced by this response (incidentally here I’ve found a LW post making a similar claim). If your only justification for “is move X best” is “because I’ve tried all others”, that doesn’t exactly seem like usefully accumulated knowledge. You can’t generalize from it, for one thing.
And for philosophy, if we’re still only on the level of endless arguments and counterarguments, that doesn’t seem like useful philosophical progress at all, certainly not something a human or AI should use as a basis for further deductions or decisions.
What’s an example of useful existing knowledge we’ve accumulated that we can’t in retrospect verify far more easily than we acquired it?
Williamson seems to be making a semantic argument rather than arguing anything concrete. Or at least, the 6 claims he’s making seem to all be restatements of “philosophy is a science” without ever actually arguing why “a science” makes philosophy equivalently easy than other things labeled “a science”. For example, I can replace “philosophy” in your list of claims with “religion”, with the only claim that seems iffy being 5
But of course, this claim is iffy for philosophy too. In what sense is philosophical knowledge not “starkly different from the methods of other sciences”? A key component of science is experiment, and in that sense, religion is much more science-like than philosophy! Eg see the ideas of personal experimentation in buddhism, and mormon epistemology (ask Claude about the significance of Alma 32 in mormon epistemology).
I’m not saying religion is a science, or that it is more right than philosophy, just that your representation of Williamson here doesn’t seem much more than a semantic dispute.
In particular, the real question here is whether the mechanisms we expect to automate science and math will also automate philosophy, not whether we ought to semantically group philosophy as a science. The reason we expect science and math to get automated is the existence of relatively concrete & well defined feedback loops between actions and results. Or at minimum, much more concrete feedback loops than philosophy has, and especially the philosophy Wei Dai typically cares about has (eg moral philosophy, decision theory, and metaphysics).
Concretely, if AIs decide that it is a moral good to spread the good word of spiralism, there’s nothing (save humans, but that will go away once we’re powerless) to stop them, but if they decide quantum mechanics is fake, or 2+2=5, well… they won’t make it too far.
I’d guess this is also why Wei Dai believes in “philosophical exceptionalism”. Regardless of whether you want to categorize philosophy as a science or not, the above paragraph applies just as well to groups of humans as to AIs. Indeed, there have been much much more evil & philosophically wrong ideologies than spiralism in the past.
Whether experiments serve as a distinction between science and philosophy, TW has a lecture arguing against this, and he addresses this in a bunch of papers. I’ll summarise his arguments later if I have time.
To clarify, I listed some of Williamson’s claims, but I haven’t summarised any of his arguments.
His actual arguments tend to be ‘negative’, i.e. they goes through many distinctions that metaphilosophical anti-exceptionalists purport, and for each he argues that either (i) the purported distinction is insubstantial,[1] or (ii) the distinction mischaracterised philosophy or science or both.[2]
He hasn’t I think addressed Wei Dai’s exceptionalism, which is (I gather) something like “Solomonoff induction provides a half-way decent formalisms of ideal maths/science, but there isn’t a similarly decent formalism of ideal philosophy.”
I’ll think a bit more about what Williamson might say about that Wei Dai’s purported distinction. I think Williamson is open to the possibility that philosophy is qualitatively different from science, so it’s possible he would change his mind if he engaged with Dai’s position.
An illustrative strawman: that philosophers publish in journals with ‘philosophy’ in the title would not be a substantial difference.
E.g., one purported distinction he critiques is that philosophy is concerned with words/concepts in a qualitatively different way than the natural sciences.
I think even still, if these are the claims he’s making, none of them seem particularly relevant to the question of “whether the mechanisms we expect to automate science and math will also automate philosophy”.
My own take on philosophy is that it’s basically divided into 3 segments:
The philosophical problems that were solved, but the solutions are unsatisfying, so philosophers try to futilely make progress on the problem, whereas other scientists content themselves with less general solutions that evade the impossibilities.
(An example is how many philosophical problems basically reduce to the question of “does there exist a way to have a prior that is always better than any other prior for a set of data without memorizing all of the data”, and the answer is no in general, because of the No Free Lunch theorem, and an example of the problem solved is the Problem of Induction, but that matters less than people think because our world doesn’t satisfy the property of what’s required to generate a No Free Lunch result, and ML/AI is focused on solving specific problems in our universe).
2. The philosophical problem depends on definitions in an essential way, such that solving the problem amounts to disambiguating the definition, and there is no objective choice. (Example: Any discussion of what art is, and more generally any discussion of what X is potentially vulnerable to this sort of issue).
3. Philosophical problems that are solved, where the solutions aren’t unsatisfying to us (A random example is Ayer’s Puzzle of why would you collect any new data if you want to find the true hypothesis, solved by Mark Sellke).
A potential crux with Raemon/Wei Dai here is that I think that lots of philosophical problems are impossible to solve in a satisfying/fully general way, and that this matters a lot less to me than to a lot of LWers.
Another potential crux is that I don’t think preference aggregation/CEV can actually work without a preference prior/base values that must be arbitrarily selected, and thus politics is inevitably going to be in the preference aggregation (This comes from Steven Byrnes here):
On the philosophical problems posed by Wei Dai, here’s what I’d say:
All of these problems are problems where it isn’t worth it for humanity to focus on the problems, and instead delegate them to aligned AIs, with a few caveats (I’ll also say that there doesn’t exist a single decision theory that outperforms every other decision theory, links here and here (though there is a comment that I do like here))
This is very much dependent on the utility function/values, so this needs more assumptions in order to even have a solution.
Again, this needs assumptions over the utility function/fairness metric in order to even have a solution.
Again, entirely dependent on the utility functions.
I basically agree with Connor Leahy that the definition of metaphilosophy/philosophy is so large as to contain everything, and thus this is an ask for us to be able to solve every problem, so in that respect the No Free Lunch theorem tells us that we have to in general have every possible example memorized in training, and since this is not possible for us, we can immediately say that there is no generally correct philosophical reasoning that can be specified into an AI design, but in my view this matters a lot less than people think it does.
Depends, but in general the better AI is at hard to verify tasks, the better it’s philosophy is.
In general, this is dependent on their utility functions, but one frame that I do like is Preference Aggregation as Bayesian Inference.
The first question is a maybe interesting research question, but I don’t think we need AGI to understand/have normativity.
For the first question, most alignment plans have the implicit meta-ethical assumption of moral relativism, which is that there’s no fundamentally objective values, and every value is valid, we just have to take the values of a human as given, as well as utility functions being a valid representative of human value, in that we can reduce what humans value into a utility function, but this is always correct, so it doesn’t matter.
Moral relativism is in a sense the most minimal metaethical assumption you can make, as it is entirely silent on what moral views are correct.
And that’s my answer to all of the questions from this post.
Williamson and Dai both appear to describe philosophy as a general-theoretical-model-building activity, but there are other conceptions of what it means to do philosophy. In contrast to both Williamson and Dai, if Wittgenstein (either early or late period) is right that the proper role of philosophy is to clarify and critique language rather than to construct general theses and explanations, LLM-based AI may be quickly approaching peak-human competence at philosophy. Critiquing and clarifying writing are already tasks that LLMs are good at and widely used for. They’re tasks that AI systems improve at from the types of scaling-up that labs are already doing, and labs have strong incentives to keep making their AIs better at them. As such, I’m optimistic about the philosophical competence of future AIs, but according to a different idea of what it means to be philosophically competent. AI systems that reach peak-human or superhuman levels of competence at Wittgensteinian philosophy-as-an-activity would be systems that help people become wiser on an individual level by clearing up their conceptual confusions, rather than a tool for coming up with abstract solutions to grand Philosophical Problems.