I’ve been talking about the same issue in various posts and comments, most prominently in Two Neglected Problems in Human-AI Safety. It feels like an obvious problem that (confusingly) almost no one talks about, so it’s great to hear another concerned voice.
A potential solution I’ve been mooting is “metaphilosophical paternalism”, or having AI provide support and/or error correction for humans’ philosophical reasoning, based on a true theory of metaphilosophy (i.e., understanding of what philosophy is and what constitutes correct philosophical reasoning), to help them defend against memetic attacks and internal errors. So this is another reason I’ve been advocating for research into metaphilosophy, and for pausing AI (presumably for at least multiple decades) until metaphilosophy (and not just AI alignment, unless broadly defined to imply a solution to this problem) can be solved.
On your comment about “centrally enforced policy” being “kind of fucked up and illiberal”, I think there is some hope that given enough time and effort, there can be a relatively uncontroversial solution to metaphilosophy[1], that most people can agree on at the end of the AI pause so central enforcement wouldn’t be needed. Failing that, perhaps we should take a look at what the metaphilosophy landscape looks like after a lot of further development, and then collectively make a decision on how to proceed.
I’m curious if this addresses your concern, or if you see a differently shaped potential solution.
similar to how there’s not a huge amount of controversy today about what constitutes correct mathematical or scientific reasoning, although I’d want to aim for even greater certainty/clarity than that
On your comment about “centrally enforced policy” being “kind of fucked up and illiberal”, I think there is some hope that given enough time and effort, there can be a relatively uncontroversial solution to metaphilosophy[1], that most people can agree on at the end of the AI pause so central enforcement wouldn’t be needed.
I am worried that it is impossible to come up with a solution to meta-philosophy that is uncontroversial because a reasonable fraction of people will evaluate a meta-philosophy by whether it invalidates particular object-level beliefs of theirs, and will be impossible to convince to change their evaluation strategy (except by using persuasion tactics that could have been symmetrically applied to persuade them of lots of other stuff).
I agree with this particular reason to worry that we can’t agree on a meta-philosophy, but separately think that there might not actually be a good meta-philosophy to find, especially if you’re going for greater certainty/clarity than mathematical reasoning!
I mean greater certainty/clarity than our current understanding of mathematical reasoning, which seems to me far from complete (e.g., realism vs formalism is unsettled, what is the deal with Berry’s paradox, etc). By the time we have a good meta-philosophy, I expect our philosophy of math will be much improved too.
If there is not a good meta-philosophy to find even in the sense of matching/exceeding our current level of understanding of mathematical reasoning, which I think is plausible, but it would be a seemingly very strange and confusing state of affairs, as it would mean in that in all or most fields of philosophy there is no objective or commonly agreed way to determine good how an argument is, or whether some statement is true or false, even given infinite compute or subjective time, including fields that seemingly should have objective answers like philosophy of math or meta-ethics. (Lots of people claim that morality is subjective, but almost nobody claims that “morality is subjective” is itself subjective!)
If after lots and lots of research (ideally with enhanced humans), we just really can’t find a good meta-philosophy, I would hope that we can at least find some clues as to why this is the case, or some kind of explanation that makes the situation less confusing, and then use those clues to guide us as to what to do next, as far as how to handle super-persuasion, etc.
Yeah I think this outcome is quite plausible, which is in part why I only claimed “some hope”. But
It’s also quite plausible that it won’t be like that, for example maybe a good solution to meta-philosophy will be fairly attractive to everyone despite invalidating deeply held object-level beliefs, or it only clearly invalidates such beliefs after being applied with a lot of time/compute, which won’t be available yet so people won’t reject the meta-philosophy based on such invalidations.
“What should be done if some/many people do reject the meta-philosophy based on it invalidating their beliefs?” is itself a philosophical question which the meta-philosophy could directly help us answer by accelerating philosophical progress, and/or that we can better answer after having a firmer handle on the nature of philosophy and therefore the ethics of changing people’s philosophical beliefs. Perhaps the conclusion will be that symmetrical persuasion tactics, or centrally imposed policies, are justified in this case. Or maybe we’ll use the understanding to find more effective asymmetrical or otherwise ethical persuasion tactics.
Basically my hope is that things become a lot clearer after we have a better understanding of metaphilosophy, as it seems to be a major obstacle to determining what should be done about the kind of problem described in the OP. I’m still curious whether you have any other solutions or approaches in mind.
I’ve been talking about the same issue in various posts and comments, most prominently in Two Neglected Problems in Human-AI Safety. It feels like an obvious problem that (confusingly) almost no one talks about, so it’s great to hear another concerned voice.
A potential solution I’ve been mooting is “metaphilosophical paternalism”, or having AI provide support and/or error correction for humans’ philosophical reasoning, based on a true theory of metaphilosophy (i.e., understanding of what philosophy is and what constitutes correct philosophical reasoning), to help them defend against memetic attacks and internal errors. So this is another reason I’ve been advocating for research into metaphilosophy, and for pausing AI (presumably for at least multiple decades) until metaphilosophy (and not just AI alignment, unless broadly defined to imply a solution to this problem) can be solved.
On your comment about “centrally enforced policy” being “kind of fucked up and illiberal”, I think there is some hope that given enough time and effort, there can be a relatively uncontroversial solution to metaphilosophy[1], that most people can agree on at the end of the AI pause so central enforcement wouldn’t be needed. Failing that, perhaps we should take a look at what the metaphilosophy landscape looks like after a lot of further development, and then collectively make a decision on how to proceed.
I’m curious if this addresses your concern, or if you see a differently shaped potential solution.
similar to how there’s not a huge amount of controversy today about what constitutes correct mathematical or scientific reasoning, although I’d want to aim for even greater certainty/clarity than that
I am worried that it is impossible to come up with a solution to meta-philosophy that is uncontroversial because a reasonable fraction of people will evaluate a meta-philosophy by whether it invalidates particular object-level beliefs of theirs, and will be impossible to convince to change their evaluation strategy (except by using persuasion tactics that could have been symmetrically applied to persuade them of lots of other stuff).
I agree with this particular reason to worry that we can’t agree on a meta-philosophy, but separately think that there might not actually be a good meta-philosophy to find, especially if you’re going for greater certainty/clarity than mathematical reasoning!
I mean greater certainty/clarity than our current understanding of mathematical reasoning, which seems to me far from complete (e.g., realism vs formalism is unsettled, what is the deal with Berry’s paradox, etc). By the time we have a good meta-philosophy, I expect our philosophy of math will be much improved too.
If there is not a good meta-philosophy to find even in the sense of matching/exceeding our current level of understanding of mathematical reasoning, which I think is plausible, but it would be a seemingly very strange and confusing state of affairs, as it would mean in that in all or most fields of philosophy there is no objective or commonly agreed way to determine good how an argument is, or whether some statement is true or false, even given infinite compute or subjective time, including fields that seemingly should have objective answers like philosophy of math or meta-ethics. (Lots of people claim that morality is subjective, but almost nobody claims that “morality is subjective” is itself subjective!)
If after lots and lots of research (ideally with enhanced humans), we just really can’t find a good meta-philosophy, I would hope that we can at least find some clues as to why this is the case, or some kind of explanation that makes the situation less confusing, and then use those clues to guide us as to what to do next, as far as how to handle super-persuasion, etc.
Yeah I think this outcome is quite plausible, which is in part why I only claimed “some hope”. But
It’s also quite plausible that it won’t be like that, for example maybe a good solution to meta-philosophy will be fairly attractive to everyone despite invalidating deeply held object-level beliefs, or it only clearly invalidates such beliefs after being applied with a lot of time/compute, which won’t be available yet so people won’t reject the meta-philosophy based on such invalidations.
“What should be done if some/many people do reject the meta-philosophy based on it invalidating their beliefs?” is itself a philosophical question which the meta-philosophy could directly help us answer by accelerating philosophical progress, and/or that we can better answer after having a firmer handle on the nature of philosophy and therefore the ethics of changing people’s philosophical beliefs. Perhaps the conclusion will be that symmetrical persuasion tactics, or centrally imposed policies, are justified in this case. Or maybe we’ll use the understanding to find more effective asymmetrical or otherwise ethical persuasion tactics.
Basically my hope is that things become a lot clearer after we have a better understanding of metaphilosophy, as it seems to be a major obstacle to determining what should be done about the kind of problem described in the OP. I’m still curious whether you have any other solutions or approaches in mind.