Wei Dai comments on Wei Dai’s Shortform

Wei Dai 13 May 2025 1:15 UTC
LW: 12 AF: 6
2
AF
Some potential risks stemming from trying to increase philosophical competence of humans and AIs, or doing metaphilosophy research. (1 and 2 seem almost too obvious to write down, but I think I should probably write them down anyway.)
1. Philosophical competence is dual use, like much else in AI safety. It may for example allow a misaligned AI to make better decisions (by developing a better decision theory), and thereby take more power in this universe or cause greater harm in the multiverse.
2. Some researchers/proponents may be overconfident, and cause flawed metaphilosophical solutions to be deployed or spread, which in turn derail our civilization’s overall philosophical progress.
3. Increased philosophical competence may cause many humans and AIs to realize that various socially useful beliefs have weak philosophical justifications (such as all humans are created equal or have equal moral worth or have natural inalienable rights, moral codes based on theism, etc.). In many cases the only justifiable philosophical positions in the short to medium run may be states of high uncertainty and confusion, and it seems unpredictable what effects will come from many people adopting such positions.
4. Maybe the nature of philosophy is very different from my current guesses, such that greater philosophical competence or orientation is harmful even in aligned humans/AIs and even in the long run. For example maybe philosophical reflection, even if done right, causes a kind of value drift, and by the time you’ve clearly figured that out, it’s too late because you’ve become a different person with different values.
What links here?
- Wei Dai's comment on Eric Neyman’s Shortform by Eric Neyman (4 Nov 2025 5:43 UTC; 3 points)
- Wei Dai's comment on The stakes of AI moral status by Joe Carlsmith (26 May 2025 15:51 UTC; 2 points)
- TsviBT 4 Nov 2025 8:56 UTC
  LW: 6 AF: 4
  0
  AF Parent
  This is pretty related to 2--4, especially 3 and 4, but also: you can induce ontological crises in yourself, and this can be pretty fraught. Two subclasses:
  - You now think of the world in a fundamentally different way. Example: before, you thought of “one real world”; now you think in terms of Everett branches, mathematical multiverse, counterlogicals, simiulation, reality fluid, attention juice, etc. Example: before, a conscious being is a flesh-and-blood human; now it is a computational pattern. Example: before you took for granted a background moral perspective; now, you see that everything that produces your sense of values and morals is some algorithms, put there by evolution and training. This can disconnect previously-functional flows from values through beliefs to actions. E.g. now you think it’s fine to suppress / disengage some moral intuition / worry you have, because it’s just some neurological tic. Or, now that you think of morality as “what successfully exists”, you think it’s fine to harm other people for your own advantage. Or, now that you’ve noticed that some things you thought were deep-seated, truthful beliefs were actually just status-seeking simulacra, you now treat everything as status-seeking simulacra. Or something, idk.
  - You set off a self-sustaining chain reaction of reevaluating, which degrades your ability to control your decision to continue expanding the scope of reevaluation, which degrades your value judgements and general sanity. See: https://www.lesswrong.com/posts/n299hFwqBxqwJfZyN/adele-lopez-s-shortform?commentId=RZkduRGJAdFgtgZD5 , https://www.lesswrong.com/posts/n299hFwqBxqwJfZyN/adele-lopez-s-shortform?commentId=zWyC9mDQ9FTxKEqnT
  These can also spread to other people (even if it doesn’t happen to the philosopher who comes up with the instigating thoughts).
- ryan_greenblatt 13 May 2025 6:29 UTC
  LW: 4 AF: 3
  2
  AF Parent
  Thanks, I updated down a bit on risks from increasing philosophical competence based on this (as all of these seem very weak)
  
  (Relevant to some stuff I’m doing as I’m writing about work in this area.)
  
  IMO, the biggest risk isn’t on your list: increased salience and reasoning about infohazards in general and in particular certain aspects of acausal interactions. Of course, we need to reason about how to handle these risks eventually but broader salience too early (relative to overall capabilities and various research directions) could be quite harmful. Perhaps this motivates suddenly increasing philosophical competence so we quickly move through the regime where AIs aren’t smart enough to be careful, but are smart enough to discover info hazards.
  What links here?
  - JesseClifton's comment on The stakes of AI moral status by Joe Carlsmith (26 May 2025 20:20 UTC; 3 points)
- cubefox 13 May 2025 19:12 UTC
  LW: 2 AF: 1
  0
  AF Parent
  I think the most dangerous version of 3 is a sort of Chesterton’s fence, where people get rid of seemingly unjustified social norms without realizing that they where socially beneficial. (Decline in high g birthrates might be an example.) Though social norms are instrumental values, not beliefs, and when a norm was originally motivated by a mistaken belief, it can still be motivated by recognizing that the norm is useful, which doesn’t require holding on to the mistaken belief.
  
  Do you have an example for 4? It seems rather abstract and contrived.
  
  Generally, I think the value of believing true things tends to be almost always positive. Examples to the contrary seem mostly contrived (basilisk-like infohazards) or only occur relatively rarely. (E.g. believing a lie makes you more convincing, as you don’t technically have to lie when telling the falsehood, but lying is mostly bad or not very good anyway.)
  
  Overall, I think the risks from philosophical progress aren’t overly serious while the opportunities are quite large, so the overall EV looks comfortably positive.
  - Wei Dai 14 May 2025 20:42 UTC
    LW: 5 AF: 3
    0
    AF Parent
    
    I think the most dangerous version of 3 is a sort of Chesterton’s fence, where people get rid of seemingly unjustified social norms without realizing that they where socially beneficial. (Decline in high g birthrates might be an example.) Though social norms are instrumental values, not beliefs, and when a norm was originally motivated by a mistaken belief, it can still be motivated by recognizing that the norm is useful, which doesn’t require holding on to the mistaken belief.
    
    I think that makes sense, but sometimes you can’t necessarily motivate a useful norm “by recognizing that the norm is useful” to the same degree that you can with a false belief. For example there may be situations where someone has an opportunity to violate a social norm in an unobservable way, and they could be more motivated by the idea of potential punishment from God if they were to violate it, vs just following the norm for the greater (social) good.
    
    Do you have an example for 4? It seems rather abstract and contrived.
    
    Hard not to sound abstract and contrived here, but to say a bit more, maybe there is no such thing as philosophical progress (outside of some narrow domains), so by doing philosophical reflection you’re essentially just taking a random walk through idea space. Or philosophy is a memetic parasite that exploits bug(s) in human minds to spread itself, perhaps similar to (some) religions.
    
    Overall, I think the risks from philosophical progress aren’t overly serious while the opportunities are quite large, so the overall EV looks comfortably positive.
    
    I think the EV is positive if done carefully, which I think I had previously been assuming, but I’m a bit worried now that most people I can attract to the field might not be as careful as I had assumed, so I’ve become less certain about this.
- sanyer 13 May 2025 21:05 UTC
  1 point
  0
  Parent
  I would expect higher competence in philosophy to reduce overcondidence, not increase it? The more you learn, the more you realize how much you don’t know