Wei Dai comments on The stakes of AI moral status

Wei Dai 22 May 2025 18:04 UTC
3 points
1

It seems clear enough to me that pretty much everybody is hopelessly confused about these issues, and sees no promising avenues for quick progress.

If that’s the case, why aren’t they at least raising the alarm for this additional AI risk?

“What kind of questions can you make progress on without constant grounding and dialogue with reality? This is the default of how we humans build knowledge and solve hard new questions, the places where we do best and get the least drawn astray is exactly those areas where we can have as much feedback from reality in as tight loops as possible, and so if we are trying to tackle ever more lofty problems, it becomes ever more important to get exactly that feedback wherever we can get it!”

It seems to me that we’re able to make progress on questions “without constant grounding and dialogue with reality”, just very slowly. (If this isn’t possible, then what are philosophers doing? Are they all just wasting their time?) I also think it’s worth working on metaphilosophy, even if we don’t expect to solve it in time or make much progress, if only to provide evidence to policymakers that it really is a hard problem (and therefore an additional reason to pause/stop AI development). But I would be happier even if nobody worked on this, but just more people publicly/prominently stated that this is an additional concern for them about AGI.
- xpym 22 May 2025 18:30 UTC
  3 points
  2
  Parent
  
  If that’s the case, why aren’t they at least raising the alarm for this additional AI risk?
  
  My impression is that those few who at least understand that they’re confused do that, whereas most are also meta-confused.
  
  If this isn’t possible, then what are philosophers doing? Are they all just wasting their time?
  
  Not exactly an unheard of position.
  
  provide evidence to policymakers that it really is a hard problem
  
  I don’t think that philosophy/metaphilosophy has a good track record of providing strong evidence for anything, so policymakers aren’t predisposed to taking arguments from those quarters seriously. I expect that only a really dramatic warning shot can change the AI trajectory (and even then it’s not a sure bet — Covid was plenty dramatic, and yet no significant opposition to gain-of-function seems to have materialized).
  - Wei Dai 22 May 2025 19:04 UTC
    5 points
    −2
    Parent
    
    My impression is that those few who at least understand that they’re confused do that
    
    Who else is doing this?
    
    Not exactly an unheard of position.
    
    All of your links are to people proposing better ways of doing philosophy, which contradicts that it’s impossible to make progress in philosophy.
    
    policymakers aren’t predisposed to taking arguments from those quarters seriously
    
    There are various historical instances of philosophy having large effects on policy (not always in a good way), e.g., abolition of slavery, rise of liberalism (“the Enlightenment”), Communism (“historical materialism”).
    - xpym 23 May 2025 4:33 UTC
      3 points
      2
      Parent
      
      Who else is doing this?
      
      MacAskill is probably the most prominent, with his “value lock-in” and “long reflection”, but in general the notion of philosophical confusion/inadequacy seems a common component of various AI risk cases. I’ve been particularly impressed by John Wentworth. (1, 2, 3)
      
      All of your links are to people proposing better ways of doing philosophy, which contradicts that it’s impossible to make progress in philosophy.
      
      The point is that it’s impossible to do useful philosophy without close and constant contact with reality. Your examples of influential philosophical ideas (abolition of slavery, the Enlightenment, Communism) were coincidentally all responses to clear and major observable problems (the horrors of slavery, sectarian wars and early industrial working conditions, respectively).
      - Wei Dai 23 May 2025 19:32 UTC
        4 points
        0
        Parent
        
        MacAskill is probably the most prominent, with his “value lock-in” and “long reflection”, but in general the notion of philosophical confusion/inadequacy seems a common component of various AI risk cases. I’ve been particularly impressed by John Wentworth.
        
        That’s true, but neither of them have talked about the more general problem “maybe humans/AIs won’t be philosophically competent enough, so we need to figure out how to improve human/AI philosophically competence” or at least haven’t said this publicly or framed their positions this way.
        
        The point is that it’s impossible to do useful philosophy without close and constant contact with reality.
        
        I see, but what if there are certain problems which by their nature just don’t have clear and quick feedback from reality? One of my ideas about metaphilosophy is that this is a defining feature of philosophical problems or what makes a problem more “philosophical”. Like for example, what should my intrinsic (as opposed to instrumental) values be? How would I get feedback from reality about this? I think we can probably still make progress on these types of questions, just very slowly. If your position is that we can’t make any progress at all, then 1) how do you know we’re not just making progress slowly and 2) what should we do? Just ignore them? Try to live our lives and not think about them?
        xpym 24 May 2025 6:54 UTC
        2 points
        0
        Parent
        
        what if there are certain problems which by their nature just don’t have clear and quick feedback from reality?
        
        Seems overwhelmingly likely to me that those problems will remain unsolved, until such time as we figure out how that feedback can be acquired. An example of a long-standing philosophical problem that could eventually be solved in this way is the problem of consciousness: if we’re eventually able to build artificial brains and “upload” ourselves, by testing different designs we’d be able to figure out which material features give rise to qualia experiences, and by what mechanisms.
        
        Like for example, what should my intrinsic (as opposed to instrumental) values be?
        
        We do receive feedback on this from reality, albeit slowly — through cultural evolution/natural selection. To the extent that this filter isn’t particularly strict, within the range it allows variation will probably remain arbitrary.
        
        how do you know we’re not just making progress slowly
        
        Because there’s no consensus that any major long-standing philosophical problem has ever been solved through philosophical methods.
        
        what should we do?
        
        Figure out where we’re confused and stop making same old mistakes/walking in circles. Build better tools which expand the range of experiments we can do. Try not to kill ourselves in the meantime (hard mode).
        RogerDearnaley 25 May 2025 6:19 UTC
        6 points
        −3
        Parent
        People often seem to confuse Philosophy with a science. It’s not. The only way you can disprove any philosophical viewpoint is by conclusively demonstrating, to the satisfaction of almost all other philosophers, that it inherently contains some irreconcilable internal logical inconsistency with itself (a relatively rare outcome). Other than that, philosophy is an exercise in enumerating, naming, and classifying, in the absence of any actual evidence on the subject, all the possible answers that could be true to interesting questions on subjects that we know nothing about, and agreeing to disagree about which of them seems more plausible. Philosophical progress thus normally increases the number of possible answers to a question, rather than decreasing it. Anyone criticizing human philosophers for not making enough progress in decreasing the number of answers to important questions has fundamentally misunderstood what philosophers actually do.
        
        Once we have actual evidence about something, such that you can do the Bayesian thing, falsify some theories and this finally reduce the number of plausible answers, then it becomes a science, and (gradually, as scientific process is made and the range of plausible answers decreases) stops being interesting to philosophers. There is a border between Philosophy and Science, and it only moves in one direction: Science expands and Philosophy loses interest and retreats. If we’re eventually able to build artificial brains and “upload” ourselves, the resulting knowledge about consciousness will be a science of consciousness, and philosophers will gradually stop being interested in discussing consciousness (and presumably find something more obscure that we still have no evidence about to discuss instead).
        Morality is part way through this process of retreat. We do have a science of morality: it’s called evolutionary ethics, and is a perfectly good subfield of evolutionary psychology (albeit one where doing experiments is rather challenging). There are even some philosophers who have noticed this, and are saying “hey, guys, here are the answers to all those questions about where human moral intuitions and beliefs come from that we’ve been discussing for the last 2500 years-or-so”. However, a fair number of moral philosophers don’t seem to have yet acknowledged this, and are still discussing things like moral realism and moral relativism (issues on which evolutionary ethics gives very clear and simple answers).
        Wei Dai 26 May 2025 17:05 UTC
        3 points
        0
        Parent
        
        An example of a long-standing philosophical problem that could eventually be solved in this way is the problem of consciousness: if we’re eventually able to build artificial brains and “upload” ourselves, by testing different designs we’d be able to figure out which material features give rise to qualia experiences, and by what mechanisms.
        
        I think this will help, but won’t solve the whole problem by itself, and we’ll still need to decide between competing answers without direct feedback from reality to help us choose. Like today, there are people who deny the existence of qualia altogether, and think it’s an illusion or some such, so I imagine there will also be people in the future who claim that the material features you claim to give rise to qualia experiences, merely give rise to reports of qualia experiences.
        
        We do receive feedback on this from reality, albeit slowly — through cultural evolution/natural selection. To the extent that this filter isn’t particularly strict, within the range it allows variation will probably remain arbitrary.
        
        So within this range, I still have to figure out what my values should be, right? Is your position that it’s entirely arbitrary, and any answer is as good as another (within the range)? How do I know this is true? What feedback from reality can I use to decide between “questions without feedback from reality can only be answered arbitrarily” and “there’s another way to (very slowly) answer such questions, by doing what most philosophers do”, or is this meta question also arbitrary (in which case your position seems to be self-undermining, in a way similar to logical positivism)?
        xpym 27 May 2025 5:29 UTC
        1 point
        0
        Parent
        
        Like today, there are people who deny the existence of qualia altogether, and think it’s an illusion or some such, so I imagine there will also be people in the future who claim that the material features you claim to give rise to qualia experiences, merely give rise to reports of qualia experiences.
        
        I mean, there are still people claiming that Earth is flat, and that evolution is an absurd lie. But insofar as consensus on anything is ever reached, it basically always requires both detailed tangible evidence and abstract reasoning. I’m not denying that abstract reasoning is necessary, it’s just far less sufficient by itself than mainstream philosophy admits.
        
        I still have to figure out what my values should be, right? Is your position that it’s entirely arbitrary, and any answer is as good as another (within the range)?
        
        We do have meta-preferences about our preferences, and of course with regard to our meta-preferences our values aren’t arbitrary. But this just escalates the issue one level higher—when the whole values + meta-values structure is considered, there’s no objective criterion for determining the best one (found so far).
        
        How do I know this is true? What feedback from reality can I use to decide between “questions without feedback from reality can only be answered arbitrarily” and “there’s another way to (very slowly) answer such questions, by doing what most philosophers do”
        
        You can evaluate philosophical progress achieved so far, for one thing. I’m not saying that my assessment of it is inarguably correct (indeed, given that mainstream philosophy isn’t seriously discredited yet, reasonable people clearly can disagree), but if your conclusions are different, I’d like to know why.
        Wei Dai 27 May 2025 6:31 UTC
        5 points
        0
        Parent
        
        I’m not saying that my assessment of it is inarguably correct (indeed, given that mainstream philosophy isn’t seriously discredited yet, reasonable people clearly can disagree), but if your conclusions are different, I’d like to know why.
        
        It’s mainly because when I’m (seemingly) making philosophical progress myself, e.g., this and this, or when I see other people making apparent philosophical progress, it looks more like “doing what most philosophers do” than “getting feedback from reality”.
        xpym 27 May 2025 8:26 UTC
        1 point
        0
        Parent
        
        Humanity has been collectively trying to solve some philosophical problems for hundreds or even thousands of years, without arriving at final solutions.
        
        Instead of using philosophy to solve individual scientific problems (natural philosophy) we use it to solve science as a methodological problem (philosophy of science).
        
        But humans seemingly do have indexical values, so what to do about that?
        
        But humans don’t have this, so how are humans supposed to reason about such correlations?
        
        I would categorize this as incorporating feedback from reality, so perhaps we don’t really disagree much.
        cubefox 24 May 2025 15:55 UTC
        −2 points
        −4
        Parent
        
        what should we do?
        
        Figure out where we’re confused
        
        Congratulations, you just reinvented philosophy. :)