Max H comments on Racing For AI Safety™ was always a bad idea, right?

Max H 16 Nov 2025 4:04 UTC
4 points
2
1. MIRI was rolling their own metaethics (deploying novel or controversial philosophy) which is not a good idea even if alignment turned out to be not that hard in a technical sense.
What specifically is this referring to? The Mere Goodness sequences?
I read your recent post about not rolling your own metaethics as addressed mostly at current AGI or safety researchers who are trying to build or align AIs today. I had thought what you were saying was that those researchers would be better served by stopping what they are doing with AI research, and instead spend their time carefully studying / thinking about / debating / writing about philosophy and metaethics. If someone asked me, I would point to Eliezer’s metaethics sequences (and some of your posts and comments, among others) as a good place to start with that.
I don’t think Eliezer got everything right about philosophy, morality, decision theory, etc. in 2008, but I don’t know of a better / more accessible foundation, and he (and you) definitely got some important and basic ideas right, which are worth accepting and building on (as opposed to endlessly rehashing or recursively going meta on).
Is your view that it was a mistake to even try writing about metaethics while also doing technical alignment research in 2008? Or that the specific way Eliezer wrote those particular sequences is so bad / mistaken / overconfident, that it’s a central example of what you want to caution against with “rolling your own metaethics”? Or merely that Eliezer did not “solve” metaethics sufficiently well, and therefore he (and others) were mistaken to move ahead and / or turn their attention elsewhere? (Either way / regardless, I still don’t really know what you are concretely recommending people do instead, even after reading this thread.)
- Wei Dai 16 Nov 2025 4:40 UTC
  4 points
  2
  Parent
  My position is a combination of:
  1. Eliezer was too confident in his own metaethics, and in his decision theory to a lesser degree (unlike metaethics, he never considered decision theory a solved problem, but was also willing to draw stronger practical conclusions from it than I think was justified) (and probably other philosophical positions that aren’t as salient in my mind EDIT: oh yeah altruism and identity)
  2. Trying to solve philosophical problems like these on a deadline with intent to deploy them into AI is not a good plan, especially if you’re planning to deploy it even if it’s still highly controversial (i.e., a majority of professional philosophers think you are wrong). This includes Eliezer’s effort as well as everyone else’s.
  A couple of posts arguing for 1 above:
  - https://www.lesswrong.com/posts/QvYKSFmsBX3QhgQvF/morality-isn-t-logical
  - https://www.lesswrong.com/posts/orhEa4wuRJHPmHFsR/six-plausible-meta-ethical-alternatives
  Either way / regardless, I still don’t really know what you are concretely recommending people do instead, even after reading this thread.
  
  Did the above help you figure it out? If not, be more specific about what’s confusing you about that thread?