A big part of the problem, in a sense is that the discussion is usually focused on dunking on bad arguments.
One of the takeaways of the history of science/progress is that in general, you should pretty much ignore bad arguments against an idea, and most importantly not update towards towards your idea being correct.
The post linked was in part a response to a comment of yours on my last post.
And this shows up a lot in the political examples, and the big issue I’ve noticed in political discourse is everyone goes towards the weakest arguments on the other side, and don’t steelman their opponents (this is in combination with another issue that a lot of people are trying to smuggle in moral claims based on the factual claims, as well as trying to use the factual claims to normalize hurting/killing people on the other side because lots of people simply want to hurt/kill other people, and are bottlenecked by logistics plus opposition).
This is one of my leading theories on how political discussions go wrong nowadays.
Another example here is the orthogonality thesis and instrumental convergence back in 2006-2008 tried to debunk bad arguments from AI optimists at the time, and one of the crucial mistakes that I think doomed MIRI towards having unreasonable (in my eyes) confidence about the hardness of the AI safety problem is the fact that they kept engaging with bad critics instead of trying to invent imaginary steelmans of the AI optimist position (I also think the AI optimists have done this to a lesser extent) (though to be fair we knew a lot less about AI back in 2006-2008).
This is also why empirical evidence is usually far more valuable than arguments, as it cuts out the selection effects that can be a massive problem, and is undoubtably a better critic than anyone will likely generate (except in certain fields).
This is also why I think the recent push to make AI safety have traction amongst the general public by creating a movement is a mistake.
Zack M. Davis’s Steelmanning is Normal, ITT-passing is Niche is relevant here (but there are 2 caveats, in that in a case where one person just has way more knowledge, ITT is disproprotinately useful, and in cases where emotions are a rate-limiting factor, ITTs are also necessary).
So one of the key things LWers should be expected to do is be able to steelman beliefs (that aren’t moral beliefs) that they think are wrong, and to always focus on the best arguments/evidence.
My own take on philosophy is that it’s basically divided into 3 segments:
The philosophical problems that were solved, but the solutions are unsatisfying, so philosophers try to futilely make progress on the problem, whereas other scientists content themselves with less general solutions that evade the impossibilities.
(An example is how many philosophical problems basically reduce to the question of “does there exist a way to have a prior that is always better than any other prior for a set of data without memorizing all of the data”, and the answer is no in general, because of the No Free Lunch theorem, and an example of the problem solved is the Problem of Induction, but that matters less than people think because our world doesn’t satisfy the property of what’s required to generate a No Free Lunch result, and ML/AI is focused on solving specific problems in our universe).
2. The philosophical problem depends on definitions in an essential way, such that solving the problem amounts to disambiguating the definition, and there is no objective choice. (Example: Any discussion of what art is, and more generally any discussion of what X is potentially vulnerable to this sort of issue).
3. Philosophical problems that are solved, where the solutions aren’t unsatisfying to us (A random example is Ayer’s Puzzle of why would you collect any new data if you want to find the true hypothesis, solved by Mark Sellke).
A potential crux with Raemon/Wei Dai here is that I think that lots of philosophical problems are impossible to solve in a satisfying/fully general way, and that this matters a lot less to me than to a lot of LWers.
Another potential crux is that I don’t think preference aggregation/CEV can actually work without a preference prior/base values that must be arbitrarily selected, and thus politics is inevitably going to be in the preference aggregation (This comes from Steven Byrnes here):
On the philosophical problems posed by Wei Dai, here’s what I’d say:
All of these problems are problems where it isn’t worth it for humanity to focus on the problems, and instead delegate them to aligned AIs, with a few caveats (I’ll also say that there doesn’t exist a single decision theory that outperforms every other decision theory, links here and here (though there is a comment that I do like here))
This is very much dependent on the utility function/values, so this needs more assumptions in order to even have a solution.
Again, this needs assumptions over the utility function/fairness metric in order to even have a solution.
Again, entirely dependent on the utility functions.
I basically agree with Connor Leahy that the definition of metaphilosophy/philosophy is so large as to contain everything, and thus this is an ask for us to be able to solve every problem, so in that respect the No Free Lunch theorem tells us that we have to in general have every possible example memorized in training, and since this is not possible for us, we can immediately say that there is no generally correct philosophical reasoning that can be specified into an AI design, but in my view this matters a lot less than people think it does.
Depends, but in general the better AI is at hard to verify tasks, the better it’s philosophy is.
In general, this is dependent on their utility functions, but one frame that I do like is Preference Aggregation as Bayesian Inference.
The first question is a maybe interesting research question, but I don’t think we need AGI to understand/have normativity.
For the first question, most alignment plans have the implicit meta-ethical assumption of moral relativism, which is that there’s no fundamentally objective values, and every value is valid, we just have to take the values of a human as given, as well as utility functions being a valid representative of human value, in that we can reduce what humans value into a utility function, but this is always correct, so it doesn’t matter.
Moral relativism is in a sense the most minimal metaethical assumption you can make, as it is entirely silent on what moral views are correct.
And that’s my answer to all of the questions from this post.