Davidmanheim comments on harfe’s Shortform

Davidmanheim 12 Feb 2025 13:21 UTC
3 points
0
My own best hope at this point is that someone will actually solve the “civilizational superalignment” problem of CEV, i.e. learning how to imbue autonomous AI with the full set of values (whatever they are) required to “govern” a transhuman civilization in a way that follows from the best in humanity, etc. - and that this solution will be taken into account by whoever actually wins the race to superintelligence.
Sounds like post-hoc justification for not even trying to stop something bad by picking a plan with zero percent chance of success, instead of further thought and actually trying to do the impossible.
- Mitchell_Porter 12 Feb 2025 16:45 UTC
  11 points
  0
  Parent
  Do you perceive the irony in telling me my hope has “zero percent chance” of happening, then turning around and telling me to do the impossible? I guess some impossibles are more impossible than others.
  In fact I’ve spent 30 years attempting various forms of “the impossible” (i.e. things of unknown difficulty that aren’t getting done), it’s kind of why I’m chronically unemployed and rarely have more than $2000 in the bank. I know how to be audaciously ambitious in unpromising circumstances and I know how to be stubborn about it.
  You like to emphasize the contingency of history as a source of hope. Fine. Let me tell you that the same applies to the world of intellectual discovery, which I know a lot better than I know the world of politics. Revolutionary advances in understanding can and do occur, and sometimes on the basis of very simple but potent insights.
  - Davidmanheim 13 Feb 2025 11:42 UTC
    2 points
    0
    Parent
    Sorry if this was unclear, but there’s a difference between plans which work conditioning on an impossibility, and trying to do the impossible. For example, building a proof that works only if P=NP is true is silly in ways that trying to prove P=NP is not. The second is trying to do the impossible, the first is what I was dismissive of.
    - Mitchell_Porter 13 Feb 2025 17:02 UTC
      2 points
      0
      Parent
      So what’s the impossible thing—identifying an adequate set of values? instilling them in a superintelligence?
      - Davidmanheim 14 Feb 2025 1:55 UTC
        2 points
        0
        Parent
        Yes, doing those things in ways that a capable alignment research can’t find obvious failure modes for. (Which may not be enough, given that they.aren’t superintelligences—but is still a bar which no proposed plan comes close to passing.)
        Mitchell_Porter 14 Feb 2025 9:58 UTC
        5 points
        2
        Parent
        Is there someone you regard as the authority on why it can’t be done? (Yudkowsky? Yampolskiy?)
        Because what I see, are not problems that we know to be unsolvable, but rather problems that the human race is not seriously trying to solve.
        Davidmanheim 14 Feb 2025 13:57 UTC
        2 points
        0
        Parent
        I think that basically everyone at MIRI, Yampolskiy, and a dozen other people all have related and strong views on this. You’re posting on Lesswrong, and I don’t want to be rude, but I don’t know why I’d need to explain this instead of asking you to read the relevant work.
        Mitchell_Porter 15 Feb 2025 23:04 UTC
        4 points
        0
        Parent
        I asked because I’m talking with you and I wanted to know *your* reasoning as to why a technical solution to the alignment of superintelligence is impossible. It seems to be “lots of people see lots of challenges and they are too many to overcome, take it up with them”.
        But it’s just a hard problem, and the foundations are not utterly mysterious. Humanity understands quite a lot about the physical and computational nature of our reality by now.
        Maybe it would be more constructive to ask how you envisage achieving the political impossible of stopping the worldwide AI race, since that’s something that you do advocate.