Max H comments on Relitigating the Race to Build Friendly AI

Max H 16 Nov 2025 4:04 UTC
6 points
4
1. MIRI was rolling their own metaethics (deploying novel or controversial philosophy) which is not a good idea even if alignment turned out to be not that hard in a technical sense.
What specifically is this referring to? The Mere Goodness sequences?
I read your recent post about not rolling your own metaethics as addressed mostly at current AGI or safety researchers who are trying to build or align AIs today. I had thought what you were saying was that those researchers would be better served by stopping what they are doing with AI research, and instead spend their time carefully studying / thinking about / debating / writing about philosophy and metaethics. If someone asked me, I would point to Eliezer’s metaethics sequences (and some of your posts and comments, among others) as a good place to start with that.
I don’t think Eliezer got everything right about philosophy, morality, decision theory, etc. in 2008, but I don’t know of a better / more accessible foundation, and he (and you) definitely got some important and basic ideas right, which are worth accepting and building on (as opposed to endlessly rehashing or recursively going meta on).
Is your view that it was a mistake to even try writing about metaethics while also doing technical alignment research in 2008? Or that the specific way Eliezer wrote those particular sequences is so bad / mistaken / overconfident, that it’s a central example of what you want to caution against with “rolling your own metaethics”? Or merely that Eliezer did not “solve” metaethics sufficiently well, and therefore he (and others) were mistaken to move ahead and / or turn their attention elsewhere? (Either way / regardless, I still don’t really know what you are concretely recommending people do instead, even after reading this thread.)
- Wei Dai 16 Nov 2025 4:40 UTC
  11 points
  −2
  Parent
  My position is a combination of:
  1. Eliezer was too confident in his own metaethics, and in his decision theory to a lesser degree (unlike metaethics, he never considered decision theory a solved problem, but was also willing to draw stronger practical conclusions from it than I think was justified) (and probably other philosophical positions that aren’t as salient in my mind EDIT: oh yeah altruism and identity)
  2. Trying to solve philosophical problems like these on a deadline with intent to deploy them into AI is not a good plan, especially if you’re planning to deploy it even if it’s still highly controversial (i.e., a majority of professional philosophers think you are wrong). This includes Eliezer’s effort as well as everyone else’s.
  A couple of posts arguing for 1 above:
  - https://www.lesswrong.com/posts/QvYKSFmsBX3QhgQvF/morality-isn-t-logical
  - https://www.lesswrong.com/posts/orhEa4wuRJHPmHFsR/six-plausible-meta-ethical-alternatives
  Either way / regardless, I still don’t really know what you are concretely recommending people do instead, even after reading this thread.
  
  Did the above help you figure it out? If not, be more specific about what’s confusing you about that thread?
  - Eli Tyre 4 Dec 2025 4:28 UTC
    4 points
    0
    Parent
    Trying to solve philosophical problems like these on a deadline with intent to deploy them into AI is not a good plan, especially if you’re planning to deploy it even if it’s still highly controversial (i.e., a majority of professional philosophers think you are wrong).
    If the majority of profesional philosophers do endorse your metaethics, how seriously should you take that?
    
    And inversely, do you think it’s implausible that you could have correctly reasoned your way to correct metaethics, as validated by a more narrow community of philosophers, but not yet have convinced everyone in the field?
    
    The attitude of the sequences emphasizes often that most people in the world believe in god, so if you’re interested in figuring out the truth, you gotta be comfortable confidently disclaiming widely held beliefs. What do you say to the person who assesses that academic philosophy is a sufficiently broken field with warped incentives that prevent intellectual progress, and thinks that they should discard the opinion of the whole thing?
    
    Do you just claim that they’re wrong about that, on the object level, and that hypothetical person should have more respect for the views of philosophers?
    
    (That said, I’ll observe that there’s an important in practice asymmetry between “almost everyone is wrong in their belief of X, and I’m confident about that” and “I’ve independently reasoned my way to Y, and I’m very confident of it.” Other people are wrong != I am right.)