Wei Dai comments on Even if we lose, we win

Wei Dai 15 Jan 2024 9:44 UTC
2 points
1
(ETA: Sorry, upon reviewing the whole thread, I think I misinterpreted your comment and thus the following reply is probably off point.)

We have to actually implement/align-the-AI-to the correct decision theory.

I think the best way to end up with an AI that has the correct decision theory is to make sure the AI can competently reason philosophically about decision theory and are motivated to follow the conclusions of such reasoning. In other words, it doesn’t judge a candidate successor decision theory by its current decision theory (CDT changing into Son-of-CDT), but by “doing philosophy”, just like humans do. Because given the slow pace of progress in decision theory, what are the chances that we correctly solve all of the relevant problems before AI takes off?
- the gears to ascension 5 Mar 2024 23:24 UTC
  2 points
  0
  Parent
  do you have thoughts on how to encode “doing philosophy” in a way that we would expect to be strongly convergent, such that if implemented on the last ai humans ever control, we can trust the process after disempowerment to continue to be usefully doing philosophy in some nailed down way?
  - Wei Dai 6 Mar 2024 2:33 UTC
    5 points
    1
    Parent
    I think we’re really far from having a good enough understanding of what “philosophy” is, or what “doing philosophy” consists of, to be able to do that. (Aside from “indirect” methods that pass the buck to simulated humans, that Pi Rogers also mentioned in another reply to you.)
    
    Here is my current best understanding of what philosophy is, so you can have some idea of how far we are from what you’re asking.
  - Morphism 6 Mar 2024 1:19 UTC
    3 points
    0
    Parent
    Maybe some kind of simulated long-reflection type thing like QACI where “doing philosophy” basically becomes “predicting how humans would do philosophy if given lots of time and resources”
  - TAG 5 Mar 2024 23:34 UTC
    0 points
    0
    Parent
    That would be a philosophical problem...
- Nicholas Kross 21 Jan 2024 4:37 UTC
  2 points
  0
  Parent
  Currently, I think this is a big crux in how to “do alignment research at all”. Debatably “the biggest” or even “the only real” crux.
  (As you can tell, I’m still uncertain about it.)