Most People Don’t Realize We Have No Idea How Our AIs Work

This point feels fairly obvious, yet seems worth stating explicitly.

Those of us familiar with the field of AI after the deep-learning revolution know perfectly well that we have no idea how our ML models work. Sure, we have an understanding of the dynamics of training loops and SGD’s properties, and we know how ML models’ architectures work. But we don’t know what specific algorithms ML models’ forward passes implement. We have some guesses, and some insights painstakingly mined by interpretability advances, but nothing even remotely like a full understanding.

And most certainly, we wouldn’t automatically know how a fresh model trained on a novel architecture that was just spat out by the training loop works.

We’re all used to this state of affairs. It’s implicitly-assumed shared background knowledge. But it’s actually pretty unusual, when you first learn of it.

And...

Relevant XKCD.

I’m pretty sure that the general public doesn’t actually know that. I don’t have hard data, but it’s my strong impression, based on reading AI-related online discussions in communities not focused around tech, talking to people uninterested in AI advances, and so on.[1]

They still think in GOFAI terms. They still believe that all of an AI’s functionality has been deliberately programmed, not trained, into it. That behind every single thing ChatGPT can do, there’s a human who implemented that functionality and understands it.

Or, at the very least, that it’s written in legible, human-readable and human-understandable format, and that we can interfere on it in order to cause precise, predictable changes.

Polls already show concern about AGI. If the fact that we don’t know what these systems are actually thinking were widely known and properly appreciated? If there weren’t the implicit assurance of “someone understands how it works and why it can’t go catastrophically wrong”?

Well, I expect much more concern. Which might serve as a pretty good foundation for further pro-AI-regulations messaging. A way to acquire some political currency you can spend.

So if you’re doing any sort of public appeals, I suggest putting the proliferation of this information on the agenda. You get about five words (per message) to the public, and “Powerful AIs Are Black Boxes” seems like a message worth sending out.[2]

  1. ^

    If you do have some hard data on that, that would be welcome.

  2. ^

    There’s been some pushback on the “black box” terminology. I maintain that it’s correct: ML models are black boxes relative to us, in the sense that by default, we don’t have much more insight into what algorithms they execute than we’d have by looking at a homomorphically-encrypted computation to which we don’t have the key, or by looking at the activity of a human brain using neuroimaging. There’s been a nonzero amount of interpretability research, but it’s still largely the case; and would be almost fully the case for models produced by novel architectures.

    ML models are not black boxes relative to the SGD, yes. The algorithm can “see” all computations happening, and tightly intervene on them. But that seems like a fairly counter-intuitive use of the term, and I maintain that “AIs are black boxes” conveys all the correct intuitions.