gwern comments on Davidmanheim’s Shortform

gwern 21 Apr 2025 16:25 UTC
18 points
10

In the first case, think of chess; superhuman chess still plays chess. You can watch AlphaZero’s games and nod along—even if it’s alien, you get what it’s doing, the structure of the chess “universe” is such that unbounded intelligence still leads to mostly understandable moves.

I guess the question here is how much is ‘mostly’? We can point to areas of chess like the endgame databases, which are just plain inscrutable: when the databases play out some mate-in-50 game because that is what is provably optimal by checking every possible move, any human understanding is largely illusory. They are brute facts determined by the totality of the game tree, not any small self-contained explanation like ‘knight forking’. (There is probably no ‘understandability’ even in principle for arbitrarily intelligent agents, similar to asking why the billionth digit of pi or Chaitin’s omega is what it is.)

And if we want to expand it out to more realistic settings, we don’t even get that: ‘chess’ doesn’t exist in the real world—only specific implementations of chess. With an actual implementation in software, maybe we get something closer to a TAS speedrun, where the chess player twitches for a while and then a buffer overflow instantly wins without the opponent getting to even move a pawn.

But a superintelligence might instead write music that sounds to us like static, full of some brilliant structure, with no ability for human brains to comprehend it. Humans might be unable to tell whether it’s genius or gibberish—but are such heights of genius a real thing? I am unsure.

But what part are you unsure about? There are surely many pragmatic ways to tell if there is a structure in the apparent static (even if it cannot be explained to us the way that, say, cryptographic algorithms can be explained to us and demonstrated by simply decrypting the ‘static’ into a very meaningful message): for example, simply see if other superintelligences or algorithms can predict/compress the static. You and I can’t see ‘non-robust features’ in images that neural networks do, but we can observe them indirectly by looking at the performance of neural networks dropping when we erase them, and see that they are really real.
- Davidmanheim 22 Apr 2025 6:21 UTC
  5 points
  3
  Parent
  We can point to areas of chess like the endgame databases, which are just plain inscrutable
  
  I think there isa key difference in places where the answers are just exhaustive search, rather than more intelligence—AI isn’t better at that than humans, and from the little I understand, AI doesn’t outperform in endgames (compared to their overperformance in general) via better policy engines, they do it via direct memorization or longer lookahead.
  The difference here matters for other domains with far larger action spaces even more, since the exponential increase makes intelligence less marginally valuable at finding increasingly rare solutions. The design space for viruses is huge, and the design space for nanomachines using arbitrary configurations is even larger. If move-37-like intuitions are common, they will be able to do things humans cannot understand, whereas if it’s more like chess endgames, they will need to search an exponential space in ways that are infeasible for them.
  
  This relates closely to a folk theorem about NP-complete problems, where exponential problems are approximately solvable with greedy algorithms in nlogn or n^2 time, and TSP is NP complete but actual salesmen find sufficiently efficient routes easily.
  But what part are you unsure about?
  Yeah, on reflection, the music analogy wasn’t a great one. I am not concerned that pattern creation that we can’t intuit could exist—humans can do that as well. (For example, it’s easy to make puzzles no-one can solve.) The question is whether important domains are amenable to kinds of solutions that ASI can understand robustly in ways humans cannot. That is, can ASI solve “impossible” problems?
  
  One specific concerning difference is whether ASI could play perfect social 12-D chess by being a better manipulator, despite all of the human-experienced uncertainties, and engineer arbitrary outcomes in social domains. There clearly isn’t a feasible search strategy with exact evaluation, but if it is far smarter than “human-legible ranges” of thinking, it might be possible.
  
  This isn’t jut relevant for AI risk, of course. Another area is biological therapies, where, for example, it seems likely that curing or reversing aging requires the same sort of brilliant insight into insane complexity, figuring out whether there would be long term or unexpected out of distribution impacts years later, without actually conducting multi-decade large scale trials.
  - samuelshadrach 22 Apr 2025 9:44 UTC
    1 point
    −2
    Parent
    Why does this matter? To quote a Yudkowsky-ish example, maybe you can take a 16-th century human (before Newtonian physics was invented, after guns were invented) and explain to him how a nuclear bomb works. This doesn’t matter for predicting the outcome of a hypothetical war between 16th century Britain and 21st century USA.
    
    ASI inventions can be big surprises and yet be things that you could understand if someone taught you.
    
    We could probably understand how a von Neumann probe or an anti-aging cure worked too, if someone taught us.
    - Davidmanheim 22 Apr 2025 15:47 UTC
      2 points
      0
      Parent
      This doesn’t matter for predicting the outcome of a hypothetical war between 16th century Britain and 21st century USA.
      
      If AI systems can make 500 years of progress before we notice it’s uncontrolled, it’s already assuming it’s a insanely strong superintelligence.
      
      We could probably understand how a von Neumann probe or an anti-aging cure worked too, if someone taught us.
      Probably, if it’s of a type we can imagine and is comprehensible in those terms—but that’s assuming the conclusion! As Gwern noted, we can’t understand chess endgames. Similarly, in the case of a strong ASI, the ASI- created probe or cure could look more like a random set of actions that aren’t explainable in our terms which cause the outcome than it does like an engineered / purpose driven system that is explainable at all.
      - samuelshadrach 24 Apr 2025 15:54 UTC
        3 points
        0
        Parent
        
        As Gwern noted, we can’t understand chess endgames.
        
        On this example specifically, a) it’s possible AI is too stupid to have a theory of mind of humans such that it can write good chess textbooks on these endgames. Maybe there is an elegant way of looking at it that isn’t brute force b) chess endgames are amenable to brute force in a way that “invent a microscope” is not. Scientific discovery is searching through an exponential space so you need a good heuristic or model for every major step you take, you can’t brute force it.
      - samuelshadrach 24 Apr 2025 5:44 UTC
        1 point
        0
        Parent
        I agree tech beyond human comprehension is possible. I’m just giving an intuition as to why a lot of radically powerful tech likely still lies within human comprehension. 500 [1] years of progress is likely to still be within comprehension, so is 50 years or 5 years.
        
        The most complex tech that exists in the universe is arguably human brains themselves and we could probably understand a good fraction of their working too, if someone explained it.
        
        Important point here being the AI has to want to explain it in simple terms to us.
        
        If you get a 16th century human to visit a nuclear facility for a day that’s not enough information for them to figure out what it does or how it works. You need to provide them textbooks that break down each of the important concepts.
        
        [1] society in 2000 is explainable to society in 1500 but society in 2500 may or may not be explainable to society in 2000 because acceleration
        Davidmanheim 27 Apr 2025 5:26 UTC
        2 points
        1
        Parent
        I think you are fooling yourself about how similar people in 1600 are to people today. The average person at the time was illiterate, superstitious, and could maybe do single digit addition and subtraction. You’re going to explain nuclear physics?
        samuelshadrach 27 Apr 2025 16:22 UTC
        1 point
        0
        Parent
        There is a similar hypothesis that is testable. Find someone who is illiterate and superstitious today and fund their education upto university level.
        
        Edit: Bonus points if they are selected from an isolated tribal community existing today