michael_mjd comments on Confused why a “capabilities research is good for alignment progress” position isn’t discussed more

michael_mjd 3 Jun 2022 2:20 UTC
LW: 5 AF: 1
AF
I think we are getting some information. For example, we can see that token level attention is actually quite powerful for understanding language and also images. We have some understanding of scaling laws. I think the next step is a deeper understanding of how world modeling fits in with action generation—how much can you get with just world modeling, versus world modeling plus reward/action combined?
If the transformer architecture is enough to get us there, it tells us a sort of null hypothesis for intelligence—that the structure for predicting sequences by comparing all pairs of elements of a limited sequence—is general.
Not rhetorically, what kind of questions you think would better lead to understanding how AGI works?
I think teaching a transformer with an internal thought process (predicting the next tokens over a part of the sequence that’s “showing your work”) would be an interesting insight into how intelligence might work. I thought of this a little while back but also discovered this is also a long standing MIRI research direction into transparency. I wouldn’t be surprised if Google took it up at this point.
- johnswentworth 3 Jun 2022 2:46 UTC
  LW: 31 AF: 10
  AF Parent
  Not rhetorically, what kind of questions you think would better lead to understanding how AGI works?
  Suppose I’m designing an engine. I try out a new design, and it surprises me—it works much worse or much better than expected. That’s a few bits of information. That’s basically the sort of information we get from AI experiments today.
  What we’d really like is to open up that surprising engine, stick thermometers all over the place, stick pressure sensors all over the place, measure friction between the parts, measure vibration, measure fluid flow and concentrations and mixing, measure heat conduction, etc, etc. We want to be able to open that black box, see what’s going on, figure out where that surprising performance is coming from. That would give us far more information, and far more useful information, than just “huh, that worked surprisingly well/poorly”. And in particular, there’s no way in hell we’re going to understand how an engine works without opening it up like that.
  The same idea carries over to AI: there’s no way in hell we’re going to understand how intelligence works without opening the black box. If we can open it up, see what’s going on, figure out where surprises come from and why, then we get orders of magnitude more information and more useful information. (Of course, this also means that we need to figure out what things to look at inside the black box and how—the analogues of temperatures, pressures, friction, mixing, etc in an engine.)
  - Lone Pine 4 Jun 2022 23:09 UTC
    4 points
    Parent
    You can build a good engine without any sensors inside, and indeed people did—i.e. back in the 19th century when sensors of that sort didn’t exist yet. (They had thermometers and pressure gauges, but they couldn’t just get any information from any point inside the engine block, like we can by looking at activations in a NN.) What the engineers of the 19th century had, and what we need, is a general theory. For engines, that was thermodynamics. For AI, we need some kind of Theory of Intelligence. The scaling laws might be pointing the way to a kind of thermodynamics of intelligence.