I think I need more practice talking with people in real time (about intellectual topics). (I’ve gotten much more used to text chat/comments, which I like because it puts less time pressure on me to think and respond quickly, but I feel like I now incur a large cost due to excessively shying away from talking to people, hence the desire for practice.) If anyone wants to have a voice chat with me about a topic that I’m interested in (see my recent post/comment history to get a sense), please contact me via PM.
Wei Dai
What do people think about having more AI features on LW? (Any existing plans for this?) For example:
AI summary of a poster’s profile, that answers “what should I know about this person before I reply to them”, including things like their background, positions on major LW-relevant issues, distinctive ideas, etc., extracted from their post/comment history and/or bio links.
“Explain this passage/comment” based on context and related posts, similar to X’s “explain this tweet” feature, which I’ve often found useful.
“Critique this draft post/comment.” Am I making any obvious mistakes or clearly misunderstanding something? (I’ve been doing a lot of this manually, using AI chatbots.)
“What might X think about this?”
Have a way to quickly copy all of someone’s posts/comments into the clipboard, or download as a file (to paste into an external AI).
I’ve been thinking about doing some of this myself (e.g., update my old script for loading all of someone’s post/comment history into one page), but of course would like to see official implementations, if that seems like a good idea.
This contradicts my position in Some Thoughts on Metaphilosophy. What about that post do you find unconvincing, or what is your own argument for “philosophy being insoluble”?
I’m not saying that my assessment of it is inarguably correct (indeed, given that mainstream philosophy isn’t seriously discredited yet, reasonable people clearly can disagree), but if your conclusions are different, I’d like to know why.
It’s mainly because when I’m (seemingly) making philosophical progress myself, e.g., this and this, or when I see other people making apparent philosophical progress, it looks more like “doing what most philosophers do” than “getting feedback from reality”.
Perhaps more seriously, the philosophers who got a temporary manpower and influence boost from the invention of math and science should have worked much harder to solve metaphilosophy, while they had the advantage.
It seems to me that values have been a main focus of philosophy for a long time, with moral philosophy (or perhaps meta-ethics if the topic is “what values are”) devoted to it and discussed frequently both in academia and out, whereas metaphilosophy has received much less attention. This implies that we know progress on understanding values is probably pretty hard on the current margins, whereas there’s a lot more uncertainty about the difficulty of metaphilosophy. Solving the latter would also be of greater utility, since it makes solving all other philosophical problems easier, not just values. I’m curious about the rationale behind your suggestion.
An example of a long-standing philosophical problem that could eventually be solved in this way is the problem of consciousness: if we’re eventually able to build artificial brains and “upload” ourselves, by testing different designs we’d be able to figure out which material features give rise to qualia experiences, and by what mechanisms.
I think this will help, but won’t solve the whole problem by itself, and we’ll still need to decide between competing answers without direct feedback from reality to help us choose. Like today, there are people who deny the existence of qualia altogether, and think it’s an illusion or some such, so I imagine there will also be people in the future who claim that the material features you claim to give rise to qualia experiences, merely give rise to reports of qualia experiences.
We do receive feedback on this from reality, albeit slowly — through cultural evolution/natural selection. To the extent that this filter isn’t particularly strict, within the range it allows variation will probably remain arbitrary.
So within this range, I still have to figure out what my values should be, right? Is your position that it’s entirely arbitrary, and any answer is as good as another (within the range)? How do I know this is true? What feedback from reality can I use to decide between “questions without feedback from reality can only be answered arbitrarily” and “there’s another way to (very slowly) answer such questions, by doing what most philosophers do”, or is this meta question also arbitrary (in which case your position seems to be self-undermining, in a way similar to logical positivism)?
I have no idea whether marginal progress on this would be good or bad
Is it because of one of the reasons on this list, or something else?
Math and science as origin sins.
From Some Thoughts on Metaphilosophy:
Philosophy as meta problem solving Given that philosophy is extremely slow, it makes sense to use it to solve meta problems (i.e., finding faster ways to handle some class of problems) instead of object level problems. This is exactly what happened historically. Instead of using philosophy to solve individual scientific problems (natural philosophy) we use it to solve science as a methodological problem (philosophy of science). Instead of using philosophy to solve individual math problems, we use it to solve logic and philosophy of math. [...] Instead of using philosophy to solve individual philosophical problems, we can try to use it to solve metaphilosophy.
It occurred to me that from the perspective of longtermist differential intellectual progress, it was a bad idea to invent things like logic, mathematical proofs, and scientific methodologies, because it permanently accelerated the wrong things (scientific and technological progress) while giving philosophy only a temporary boost (by empowering the groups that invented those things, which had better than average philosophical competence, to spread their culture/influence). Now we face the rise of China and/or AIs, both of which seem likely (or at least plausibly) to be technologically and scientifically (but not philosophically) competent, perhaps in part as a result of technological/scientific (but not philosophical) competence having been made legible/copyable by earlier philosophers.
If only they’d solved metaphilosophy first, or kept their philosophy of math/science advances secret! (This is of course not entirely serious, in case that’s not clear.)
I am essentially a preference utilitarian
Want to try answering my questions/problems about preference utilitarianism?
Maybe I would state my first question above a little differently today: Certain decision theories (such as the UDT/FDT/LDT family) already incorporate some preference-utilitarian-like intuitions, by suggesting that taking certain other agents’ preferences into account when making certain decisions is a good idea, if e.g. this is logically correlated with them taking your preferences into account. Does preference utilitarianism go beyond this, and say that you should take their preferences into account even if there is no decision theoretic reason to do so, as a matter of pure axiology (values / utility function)? Do you then take their preferences into account again as part of decision theory, or do you adopt a decision theory which denies or ignores such correlations/linkages/reciprocities (e.g., by judging them to be illusions or mistakes or some such)? Or does your preference utilitarianism do something else, like deny the division between decision theory and axiology? Also does your utility function contain non-preference-utilitarian elements, i.e., idiosyncratic preferences that aren’t about satisfying other agents’ preferences, and if so how do you choose the weights between your own preferences and other agents’?
(I guess this question/objection also applies to hedonic utilitarianism, to a somewhat lesser degree, because if a hedonic utilitarian comes across a hedonic egoist, he would also “double count” the latter’s hedons, once in his own utility function, and once again if his decision theory recommends taking the latter’s preferences into account. Another alternative that avoids this “double counting” is axiological egoism + some sort of advanced/cooperative decision theory, but then selfish values has its own problems. So my own position on is topic is one of high confusion and uncertainty.)
Sorry about the delayed reply. I’ve been thinking about how to respond. One of my worries is that human philosophy is path dependent, or another way of saying this is that we’re prone to accepting wrong philosophical ideas/arguments and then it’s hard to talk us out of them. The split of western philosophy into analytical and continental traditions seems to be an instance of this, then even within analytical philosophy, academic philosophers would strongly disagree with each other and each be confident in their own positions and rarely get talked out of them. I think/hope that humans collectively can still make philosophical progress over time (in some mysterious way that I wish I understood), if we’re left to our own devices but the process seems pretty fragile and probably can’t withstand much external optimization pressure.
On formalizations, I agree they’ve stood the test of time in your sense, but is that enough to build them into AI? We can see that they wrong on some questions, but can’t formally characterize the domain in which they are right. And even if we could, I don’t know why we’d muddle through… What if we built AI based on Debate, but used Newtonian physics to answer physics queries instead of human judgment, or the humans are pretty bad at answering physics related questions (including meta questions like how to do science)? That would be pretty disastrous, especially if there are any adversaries in the environment, right?
MacAskill is probably the most prominent, with his “value lock-in” and “long reflection”, but in general the notion of philosophical confusion/inadequacy seems a common component of various AI risk cases. I’ve been particularly impressed by John Wentworth.
That’s true, but neither of them have talked about the more general problem “maybe humans/AIs won’t be philosophically competent enough, so we need to figure out how to improve human/AI philosophically competence” or at least haven’t said this publicly or framed their positions this way.
The point is that it’s impossible to do useful philosophy without close and constant contact with reality.
I see, but what if there are certain problems which by their nature just don’t have clear and quick feedback from reality? One of my ideas about metaphilosophy is that this is a defining feature of philosophical problems or what makes a problem more “philosophical”. Like for example, what should my intrinsic (as opposed to instrumental) values be? How would I get feedback from reality about this? I think we can probably still make progress on these types of questions, just very slowly. If your position is that we can’t make any progress at all, then 1) how do you know we’re not just making progress slowly and 2) what should we do? Just ignore them? Try to live our lives and not think about them?
Interesting. Who are they and what approaches are they taking? Have they said anything publicly about working on this, and if not, why?
My impression is that those few who at least understand that they’re confused do that
Who else is doing this?
Not exactly an unheard of position.
All of your links are to people proposing better ways of doing philosophy, which contradicts that it’s impossible to make progress in philosophy.
policymakers aren’t predisposed to taking arguments from those quarters seriously
There are various historical instances of philosophy having large effects on policy (not always in a good way), e.g., abolition of slavery, rise of liberalism (“the Enlightenment”), Communism (“historical materialism”).
It seems clear enough to me that pretty much everybody is hopelessly confused about these issues, and sees no promising avenues for quick progress.
If that’s the case, why aren’t they at least raising the alarm for this additional AI risk?
“What kind of questions can you make progress on without constant grounding and dialogue with reality? This is the default of how we humans build knowledge and solve hard new questions, the places where we do best and get the least drawn astray is exactly those areas where we can have as much feedback from reality in as tight loops as possible, and so if we are trying to tackle ever more lofty problems, it becomes ever more important to get exactly that feedback wherever we can get it!”
It seems to me that we’re able to make progress on questions “without constant grounding and dialogue with reality”, just very slowly. (If this isn’t possible, then what are philosophers doing? Are they all just wasting their time?) I also think it’s worth working on metaphilosophy, even if we don’t expect to solve it in time or make much progress, if only to provide evidence to policymakers that it really is a hard problem (and therefore an additional reason to pause/stop AI development). But I would be happier even if nobody worked on this, but just more people publicly/prominently stated that this is an additional concern for them about AGI.
Given typical pace and trajectory of human philosophical progress, I think we’re unlikely to make much headway on the relevant problems (i.e., not enough to have high justified confidence that we’ve correctly solved them) before we really need the solutions, but various groups will likely convince themselves that they have, and become overconfident in their own proposed solutions. The subject will likely end up polarized and politicized, or perhaps ignored by most as they take the lack of consensus as license to do whatever is most convenient.
Even if the question of AI moral status is somehow solved, in a definitive way, what about all of the follow-up questions? If current or future AIs are moral patients, what are the implications of that in terms of e.g. what we concretely owe them as far as rights and welfare considerations? How to allocate votes to AI copies? How to calculate and weigh the value/disvalue of some AI experience vs another AI experience vs a human experience? Interpersonal utility comparison has been an unsolved problem since utilitarianism was invented, and now we have to also deal with the massive distributional shift of rapidly advancing artificial minds...
One possible way to avoid this is if we get superintelligent and philosophically supercompetent AIs, then they solve the problems and honestly report the solutions to us. (I’m worried that they’ll instead just be superpersuasive and convince us of their moral patienthood (or lack thereof, if controlled by humans) regardless of what’s actually true.) Or alternatively, humans become much more philosophically competent, such as via metaphilosophical breakthroughs, cognitive enhancements, or social solutions (perhaps mass identification/cultivation of philosophical talent).
It seems very puzzling to me that almost no one is working on increasing AI and/or human philosophical competence in these ways, or even publicly expressing the worry that AIs and/or humans collectively might not be competent enough to solve important philosophical problems that will arise during and after the AI transition. Why is AI’s moral status (and other object level problems like decision theory for AIs) considered worthwhile to talk about, but this seemingly more serious “meta” problem isn’t?
urged that it be retracted
This seems substantially different from “was retracted” in the title. Also, Arxiv apparently hasn’t yet followed MIT’s request to remove the paper, presumably following it’s own policy and waiting for the author to issue his own request.
How do you decide what to set ε to? You mention “we want assumptions about humans that are sensible a priori, verifiable via experiment” but I don’t see how ε can be verified via experiment, given that for many questions we’d want the human oracle to answer, there isn’t a source of ground truth answers that we can compare the human answers to?
With unbounded Alice and Bob, this results in an equilibrium where Alice can win if and only if there is an argument that is robust to an ε-fraction of errors.
How should I think about, or build up some intuitions about, what types of questions have an argument that is robust to an ε-fraction of errors?
Here’s an analogy that leads to a pessimistic conclusion (but I’m not sure how relevant it is): replace the human oracle with a halting oracle, the top level question being debated is whether some Turing machine T halts or not, and the distribution over which ε is define is the uniform distribution. Then it seems like Alice has a very tough time (for any T that she can’t prove halts or not herself), because Bob can reject/rewrite all the oracle answers that are relevant to T in some way, which is a tiny fraction of all possible Turing machines. (This assumes that Bob gets to pick the classifier after seeing the top level question. Is this right?)
I think the most dangerous version of 3 is a sort of Chesterton’s fence, where people get rid of seemingly unjustified social norms without realizing that they where socially beneficial. (Decline in high g birthrates might be an example.) Though social norms are instrumental values, not beliefs, and when a norm was originally motivated by a mistaken belief, it can still be motivated by recognizing that the norm is useful, which doesn’t require holding on to the mistaken belief.
I think that makes sense, but sometimes you can’t necessarily motivate a useful norm “by recognizing that the norm is useful” to the same degree that you can with a false belief. For example there may be situations where someone has an opportunity to violate a social norm in an unobservable way, and they could be more motivated by the idea of potential punishment from God if they were to violate it, vs just following the norm for the greater (social) good.
Do you have an example for 4? It seems rather abstract and contrived.
Hard not to sound abstract and contrived here, but to say a bit more, maybe there is no such thing as philosophical progress (outside of some narrow domains), so by doing philosophical reflection you’re essentially just taking a random walk through idea space. Or philosophy is a memetic parasite that exploits bug(s) in human minds to spread itself, perhaps similar to (some) religions.
Overall, I think the risks from philosophical progress aren’t overly serious while the opportunities are quite large, so the overall EV looks comfortably positive.
I think the EV is positive if done carefully, which I think I had previously been assuming, but I’m a bit worried now that most people I can attract to the field might not be as careful as I had assumed, so I’ve become less certain about this.
Some potential risks stemming from trying to increase philosophical competence of humans and AIs, or doing metaphilosophy research. (1 and 2 seem almost too obvious to write down, but I think I should probably write them down anyway.)
Philosophical competence is dual use, like much else in AI safety. It may for example allow a misaligned AI to make better decisions (by developing a better decision theory), and thereby take more power in this universe or cause greater harm in the multiverse.
Some researchers/proponents may be overconfident, and cause flawed metaphilosophical solutions to be deployed or spread, which in turn derail our civilization’s overall philosophical progress.
Increased philosophical competence may cause many humans and AIs to realize that various socially useful beliefs have weak philosophical justifications (such as all humans are created equal or have equal moral worth or have natural inalienable rights, moral codes based on theism, etc.). In many cases the only justifiable philosophical positions in the short to medium run may be states of high uncertainty and confusion, and it seems unpredictable what effects will come from many people adopting such positions.
Maybe the nature of philosophy is very different from my current guesses, such that greater philosophical competence or orientation is harmful even in aligned humans/AIs and even in the long run. For example maybe philosophical reflection, even if done right, causes a kind of value drift, and by the time you’ve clearly figured that out, it’s too late because you’ve become a different person with different values.
Thanks for letting me know. Is there anything on my list that you don’t think is a good idea or probably won’t implement, in which case I might start working on them myself, e.g. as a userscript? Especially #5, which is also useful for other reasons, like archiving and searching.