I think you could approximately define philosophy as “the set of problems that are left over after you take all the problems that can be formally studied using known methods and put them into their own fields.” Once a problem becomes well-understood, it ceases to be considered philosophy. For example, logic, physics, and (more recently) neuroscience used to be philosophy, but now they’re not, because we know how to formally study them.
So I believe Wei Dai is right that philosophy is exceptionally difficult—and this is true almost by definition, because if we know how to make progress on a problem, then we don’t call it “philosophy”.
For example, I don’t think it makes sense to say that philosophy of science is a type of science, because it exists outside of science. Philosophy of science is about laying the foundations of science, and you can’t do that using science itself.
I think the most important philosophical problems with respect to AI are ethics and metaethics because those are essential for deciding what an ASI should do, but I don’t think we have a good enough understanding of ethics/metaethics to know how to get meaningful work on them out of AI assistants.
One route here is just taboo Philosophy, and say “we’re talking about ‘reasoning about the stuff we haven’t formalized yet’”, and then it doesn’t matter whether or not there’s a formalization of what most people call “philosophy.” (actually: I notice I’m not sure if the thing-that-is “solve unformalized stuff” is “philosophy” or “metaphilosophy”)
But, if we’re evaluating whether “we need to solve metaphilosophy” (and this is a particular bottleneck for AI going well), I think we need to get a bit more specific about what cognitive labor needs to happen. It might turn out to be that all the individual bits here are reasonably captured by some particular subfields, which might or might not be “formalized.”
I would personally say “until you’ve figured out how to confidently navigate stuff that’s pre-formalized, something as powerful AI is likely to make something go wrong, and you should be scared about that”. But, I’d be a lot less confident to say the more specific sentences “you need solved metaphilosophy to align successor AIs”, or most instances of “solve ethics.”
I might say “you need to have solved metaphilosophy to do a Long Reflection”, since, sort of by definition doing a Long Reflection is “figuring everything out”, and if you’re about to do that and then Tile The Universe With Shit you really want to make sure there was nothing you failed to figure out because you weren’t good enough at metaphilosophy.
To try to explain how I see the difference between philosophy and metaphilosophy:
My definition of philosophy is similar to @MichaelDickens’ but I would use “have serviceable explicitly understood methods” instead of “formally studied” or “formalized” to define what isn’t philosophy, as the latter might be or could be interpreted as being too high of a bar, e.g., in the sense of formal systems.
So in my view, philosophy is directly working on various confusing problems (such as “what is the right decision theory”) using whatever poorly understood methods that we have or can implicitly apply, and then metaphilosophy is trying to help solve these problems on a meta level, by better understanding the nature of philosophy, for example:
Try to find if there is some unifying quality that ties all of these “philosophical” problems together (besides “lack of serviceable explicitly understood methods”).
Try to formalize some part of philosophy, or find explicitly understood methods for solving certain philosophical problems.
Try to formalize all of philosophy wholesale, or explicitly understand what is it that humans are doing (or should be doing, or what AIs should be doing) when it comes to solving problems in general. This may not be possible, i.e., maybe there is no such general method that lets us solve every problem given enough time and resources, but it sure seems like humans have some kind of general purpose (but poorly understood) method, that lets us make progress slowly over time on a wide variety of problems, including ones that are initially very confusing, or hard to understand/explain what we’re even asking, etc. We can at least aim to understand what is it that humans are or have been doing, even if it’s not a fully general method.
I’m curious what you say about “which are the specific problems (if any) where you specifically think ‘we really need to have solved philosophy / improved-a-lot-at-metaphilosophy’ to have a decent shot at solving this?’”
(as opposed to, well, generally it sounds good to be good at solving confusing problems, and we do expect to have some confusing problems to solve, but, like, we might pretty quickly figure out ‘oh, the problem is actually shaped like <some paradigmatic system>’ and then deal with it?)
I’m curious what you say about “which are the specific problems (if any) where you specifically think ‘we really need to have solved philosophy / improved-a-lot-at-metaphilosophy’ to have a decent shot at solving this?’”
Assuming by “solving this” you mean solving AI x-safety or navigating the AI transition well, I just post a draft about this. Or if you already read that and are asking for an even more concrete example, a scenario I often think about is an otherwise aligned ASI, some time into the AI transition when things are moving very fast (from a human perspective) and many highly consequential decisions need to be made (e.g., what alliances to join, how to bargain with others, how to self-modify or take advantage of the latest AI advances, how to think about AI welfare and other near-term ethical issues, what to do about commitment races and threats, how to protect the user against manipulation or value drift, whether to satisfy some user request that might be harmful according to their real values) that often involve philosophical problems. And they can’t just ask their user (or alignment target) or even predict “what would the user say if they thought about this for a long time” because the user themselves may not be philosophically very competent and/or making such predictions with high accuracy (over a long enough time frame) is still outside their range of capabilities.
So the specific problem is how to make sure this AI doesn’t make wrong decisions that cause a lot of waste or harm, that quickly or over time cause most of the potential value of the universe to be lost, which in turn seems to involve figuring out how the AI should be thinking about philosophical problems, or how to make the AI philosophically competent even if their alignment target isn’t.
Does this help / is this the kind of answer you’re asking for?
I think you could approximately define philosophy as “the set of problems that are left over after you take all the problems that can be formally studied using known methods and put them into their own fields.” Once a problem becomes well-understood, it ceases to be considered philosophy. For example, logic, physics, and (more recently) neuroscience used to be philosophy, but now they’re not, because we know how to formally study them.
So I believe Wei Dai is right that philosophy is exceptionally difficult—and this is true almost by definition, because if we know how to make progress on a problem, then we don’t call it “philosophy”.
For example, I don’t think it makes sense to say that philosophy of science is a type of science, because it exists outside of science. Philosophy of science is about laying the foundations of science, and you can’t do that using science itself.
I think the most important philosophical problems with respect to AI are ethics and metaethics because those are essential for deciding what an ASI should do, but I don’t think we have a good enough understanding of ethics/metaethics to know how to get meaningful work on them out of AI assistants.
Hmm, this makes me think:
One route here is just taboo Philosophy, and say “we’re talking about ‘reasoning about the stuff we haven’t formalized yet’”, and then it doesn’t matter whether or not there’s a formalization of what most people call “philosophy.” (actually: I notice I’m not sure if the thing-that-is “solve unformalized stuff” is “philosophy” or “metaphilosophy”)
But, if we’re evaluating whether “we need to solve metaphilosophy” (and this is a particular bottleneck for AI going well), I think we need to get a bit more specific about what cognitive labor needs to happen. It might turn out to be that all the individual bits here are reasonably captured by some particular subfields, which might or might not be “formalized.”
I would personally say “until you’ve figured out how to confidently navigate stuff that’s pre-formalized, something as powerful AI is likely to make something go wrong, and you should be scared about that”. But, I’d be a lot less confident to say the more specific sentences “you need solved metaphilosophy to align successor AIs”, or most instances of “solve ethics.”
I might say “you need to have solved metaphilosophy to do a Long Reflection”, since, sort of by definition doing a Long Reflection is “figuring everything out”, and if you’re about to do that and then Tile The Universe With Shit you really want to make sure there was nothing you failed to figure out because you weren’t good enough at metaphilosophy.
To try to explain how I see the difference between philosophy and metaphilosophy:
My definition of philosophy is similar to @MichaelDickens’ but I would use “have serviceable explicitly understood methods” instead of “formally studied” or “formalized” to define what isn’t philosophy, as the latter might be or could be interpreted as being too high of a bar, e.g., in the sense of formal systems.
So in my view, philosophy is directly working on various confusing problems (such as “what is the right decision theory”) using whatever poorly understood methods that we have or can implicitly apply, and then metaphilosophy is trying to help solve these problems on a meta level, by better understanding the nature of philosophy, for example:
Try to find if there is some unifying quality that ties all of these “philosophical” problems together (besides “lack of serviceable explicitly understood methods”).
Try to formalize some part of philosophy, or find explicitly understood methods for solving certain philosophical problems.
Try to formalize all of philosophy wholesale, or explicitly understand what is it that humans are doing (or should be doing, or what AIs should be doing) when it comes to solving problems in general. This may not be possible, i.e., maybe there is no such general method that lets us solve every problem given enough time and resources, but it sure seems like humans have some kind of general purpose (but poorly understood) method, that lets us make progress slowly over time on a wide variety of problems, including ones that are initially very confusing, or hard to understand/explain what we’re even asking, etc. We can at least aim to understand what is it that humans are or have been doing, even if it’s not a fully general method.
Does this make sense?
Yeah that all makes sense.
I’m curious what you say about “which are the specific problems (if any) where you specifically think ‘we really need to have solved philosophy / improved-a-lot-at-metaphilosophy’ to have a decent shot at solving this?’”
(as opposed to, well, generally it sounds good to be good at solving confusing problems, and we do expect to have some confusing problems to solve, but, like, we might pretty quickly figure out ‘oh, the problem is actually shaped like <some paradigmatic system>’ and then deal with it?)
Assuming by “solving this” you mean solving AI x-safety or navigating the AI transition well, I just post a draft about this. Or if you already read that and are asking for an even more concrete example, a scenario I often think about is an otherwise aligned ASI, some time into the AI transition when things are moving very fast (from a human perspective) and many highly consequential decisions need to be made (e.g., what alliances to join, how to bargain with others, how to self-modify or take advantage of the latest AI advances, how to think about AI welfare and other near-term ethical issues, what to do about commitment races and threats, how to protect the user against manipulation or value drift, whether to satisfy some user request that might be harmful according to their real values) that often involve philosophical problems. And they can’t just ask their user (or alignment target) or even predict “what would the user say if they thought about this for a long time” because the user themselves may not be philosophically very competent and/or making such predictions with high accuracy (over a long enough time frame) is still outside their range of capabilities.
So the specific problem is how to make sure this AI doesn’t make wrong decisions that cause a lot of waste or harm, that quickly or over time cause most of the potential value of the universe to be lost, which in turn seems to involve figuring out how the AI should be thinking about philosophical problems, or how to make the AI philosophically competent even if their alignment target isn’t.
Does this help / is this the kind of answer you’re asking for?