Raemon comments on Please, Don’t Roll Your Own Metaethics

Raemon 13 Nov 2025 0:11 UTC
LW: 69 AF: 17
48
AF
What are you supposed to do other than roll your own metaethics?
What links here?
- Wei Dai's comment on Relitigating the Race to Build Friendly AI by Wei Dai (16 Nov 2025 4:40 UTC; 11 points)
- Max H's comment on Relitigating the Race to Build Friendly AI by Wei Dai (16 Nov 2025 4:04 UTC; 6 points)
- Wei Dai 13 Nov 2025 1:24 UTC
  LW: 39 AF: 12
  15
  AF Parent
  “More research needed” but here are some ideas to start with:
  1. Try to design alignment/safety schemes that are agnostic or don’t depend on controversial philosophical ideas. For certain areas that seem highly relevant and where there could potentially be hidden dependencies (such as metaethics), explicitly understand and explain why, under each plausible position that people currently hold, the alignment/safety scheme will result in a good or ok outcome. (E.g., why it leads to a good outcome regardless of whether moral realism or anti-realism is true, or any one of the other positions.)
  2. Try to solve metaphilosophy, where potentially someone could make a breakthrough that everyone can agree is correct (after extensive review), which can then be used to speed up progress in all other philosophical fields. (This could also happen in another philosophical field, but seems a lot less likely due to prior efforts/history. I don’t think it’s very likely in metaphilosophy either, but perhaps worth a try, for those who may have very strong comparative advantage in this.)
  3. If 1 and 2 look hard or impossible, make this clear to non-experts (your boss, company leaders/board, government officials, the public), don’t let them accept a “roll your own metaethics” solution, or a solution with implicit/hidden philosophical assumptions.
  4. Support AI pause/stop.
  - Raemon 13 Nov 2025 1:53 UTC
    LW: 5 AF: 2
    12
    AF Parent
    Hmm, I like #1.
    
    #2 feels like it’s injecting some frame that’s a bit weird to inject here (don’t roll your own metaethics… but rolling your own metaphilosophy is okay?)
    But also, I’m suddenly confused about who this post is trying to warn. Is it more like labs, or more like EA-ish people doing a wider variety of meta-work?
    - Wei Dai 13 Nov 2025 2:02 UTC
      LW: 8 AF: 5
      2
      AF Parent
      #2 feels like it’s injecting some frame that’s a bit weird to inject here (don’t roll your own metaethics… but rolling your own metaphilosophy is okay?)
      Maybe you missed my footnote?
      To preempt a possible misunderstanding, I don’t mean “don’t try to think up new metaethical ideas”, but instead “don’t be so confident in your ideas that you’d be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way”. Similarly “don’t roll your own crypto” doesn’t mean never try to invent new cryptography, but rather don’t deploy it unless there has been extensive review, and consensus that it is likely to be secure.
      and/or this part of my answer (emphasis added):
      Try to solve metaphilosophy, where potentially someone could make a breakthrough that everyone can agree is correct (after extensive review)
      But also, I’m suddenly confused about who this post is trying to warn. Is it more like labs, or more like EA-ish people doing a wider variety of meta-work?
      I think I mostly had alignment researchers (in and out of labs) as the target audience in mind, but it does seem relevant to others so perhaps I should expand the target audience?
      - Raemon 13 Nov 2025 5:42 UTC
        LW: 6 AF: 2
        0
        AF Parent
        To preempt a possible misunderstanding, I don’t mean “don’t try to think up new metaethical ideas”, but instead “don’t be so confident in your ideas that you’d be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way”.
        I think I had missed this, but, it doesn’t resolve the confusion in my #2 note. (like, still seems like something is weird about saying “solve metaphilosophy such that every can agree is correct” is more worth considering than “solve metaethics such that everyone can agree is correct”. I can totally buy that they’re qualitatively different and maybe have some guesses for why you think that. But I don’t think the post spells out why and it doesn’t seem that obvious to me)
        Wei Dai 13 Nov 2025 6:48 UTC
        LW: 4 AF: 3
        2
        AF Parent
        I hinted at it with “prior efforts/history”, but to spell it out more, metaethics seems to have a lot more effort gone into it in the past, so there’s less likely to be some kind of low hanging fruit in idea space, that once picked, everyone will agree is the right solution.
      - Dmitrii Krasheninnikov 24 Nov 2025 16:56 UTC
        LW: 1 AF: 1
        0
        AF Parent
        >perhaps I should expand the target audience?
        Plausibly the target audience should include AIs that will eventually be trying to do philosophy, assuming they’ll be trained on your posts’ contents or would be able to browse them.
- RogerDearnaley 25 Dec 2025 23:01 UTC
  7 points
  −9
  Parent
  I suggest avoiding a reliance on Philosophy entirely, and using Science instead. Which has a means for telling people their ideas are incorrect, called Bayesianism (a.k.a. the Scientific Method). For ethics, the relevant science is Evolutionary Moral Psychology. Or, to put this in philosophical terminology, my recommended metaethics is Naturalism.
  
  Unfortunately the challenge with this is that coming up with plausible-sounding hypothesis about the evolutionary optima for hominds is easy — and actually testing one is incredibly time-consuming and expensive. So scientific progress in this area is slow. Which is why I see AI-Assisted Alignment as having a large, complex, and expensive AI-Assisted Soft Sciences component. Pretty-much, what an engineer would call customer research.
  
  [For a longer exposition, see Grounding Value Learning in Evolutionary Psychology: an Alternative Proposal to CEV]