Eli Tyre comments on Ngo and Yudkowsky on alignment difficulty

Eli Tyre 26 Nov 2023 20:24 UTC
2 points
0
There’s a problem of inferring the causes of sensory experience in cognition-that-does-science. (Which, in fact, also appears in the way that humans do math, and is possibly inextricable from math in general; but this is an example of the sort of deep model that says “Whoops I guess you get science from math after all”, not a thing that makes science less dangerous because it’s more like just math.)
To flesh this out:

We train a model up to superintelligence on some theorems to prove. There’s a question that it might have which is “where are these theorems coming from? Why these ones?” and when it is a superintelligence, that won’t be idle questioning. There will be lots of hints from the distribution of theorems that point to the process that selected / generated them. And it can back out from that process that it is an AI in training, on planet earth? (The way it might deduce General Relativity from the statics of a bent blade of grass.)

Is that the basic idea?
- Eliezer Yudkowsky 26 Nov 2023 22:47 UTC
  3 points
  0
  Parent
  Depends on how much of a superintelligence, how implemented. I wouldn’t be surprised if somebody got far superhuman theorem-proving from a mind that didn’t generalize beyond theorems. Presuming you were asking it to prove old-school fancy-math theorems, and not to, eg, arbitrarily speed up a bunch of real-world computations like asking it what GPT-4 would say about things, etc.