Zach Furman

Karma: 548

zachfurman.com

Zach Furman 30 Jun 2025 10:47 UTC
42 points
7
on: Zach Furman’s Shortform
I’ve been trying to understand modules for a long time. They’re a particular algebraic structure in commutative algebra which seems to show up everywhere any time you get anywhere close to talking about rings—and I could never figure out why. Any time I have some simple question about algebraic geometry, for instance, it almost invariably terminates in some completely obtuse property of some module. This confused me. It was never particularly clear to me from their definition why modules should be so central, or so “deep.”
I’m going to try to explain the intuition I have now, mostly to clarify this for myself, but also incidentally in the hope of clarifying this for other people. I’m just a student when it comes to commutative algebra, so inevitably this is going to be rather amateur-ish and belabor obvious points, but hopefully that leads to something more understandable to beginners. This will assume familiarity with basic abstract algebra. Unless stated otherwise, I’ll restrict to commutative rings because I don’t understand much about non-commutative ones.
The typical motivation for modules: “vector spaces but with rings”
The typical way modules are motivated is simple: they’re just vector spaces, but you relax the definition so that you can replace the underlying field with a ring. That is, an $R$ -module $M$ is a ring $R$ and an abelian group $M$ , and an operation $\cdot : R \times M \to M$ that respects the ring structure of $R$ , i.e. for all $r, s \in R$ and $x, y \in M$ :
- $r \cdot (x + y) = r \cdot x + r \cdot y$
- $(r + s) \cdot x = r \cdot x + s \cdot x$
- $(r s) \cdot x = r \cdot (s \cdot x)$
- $1 \cdot x = x$
This is literally the definition of a vector space, except we haven’t required our scalars $R$ to be a field, only a ring (i.e. multiplicative inverses don’t have to always exist). So, like, instead of multiplying vectors by real numbers or something, you multiply vectors by integers, or a polynomial—those are your scalars now. Sounds simple, right? Vector spaces are pretty easy to understand, and nobody really thinks about the underlying field of a vector space anyway, so swapping the field out shouldn’t be much of a big deal, right?
As you might guess from my leading questions, this intuition is profoundly wrong and misleading. Modules are far more rich and interesting than vector spaces. Vector spaces are quite boring; two vector spaces over the same field are isomorphic if and only if they have the same dimension. This means vector spaces are typically just thought of as background objects, a setting which you quickly establish before building the interesting things on top of. This is not the role modules play in commutative algebra—they carry far more structure intrinsically.
A better motivation for modules: “ring actions”
Instead, the way you should think about a module is as a representation of a ring or a ring action, directly analogous to group actions or group representations.^[1] Modules are a manifestation of how the ring acts on objects in the “real world.”^[2] Think about how group actions tell you how groups manifest in the real world: the dihedral group $D_{4}$ is a totally abstract object, but it manifests concretely in things like “mirroring/rotating my photo in Photoshop,” where implicitly there is now an action of $D_{4}$ on the pixels of my photo.
In fact, we can define modules just by generalizing the idea of group actions to ring actions. How might we do this? Well, remember that a group action of a group $G$ on a set $M$ is just a group homomorphism $G \to End (M)$ , where $End (M)$ is the set of all endomorphisms on $M$ (maps from $M$ to itself). In other words, for every element of $G$ , we just have to specify how it transforms elements of $M$ . Can we get a “ring action” of a ring $R$ on a set $M$ just by looking at a ring homomorphism $R \to End (M)$ ?
That doesn’t quite work, because $End (M)$ is in general a group, not a ring—what would the “addition” operation be? But if we insist that $M$ is a group, giving it an addition operation, that turns $End (M)$ into a ring, with the operation $(f + g) (x) = f (x) + g (x)$ . (This addition rule only works to produce another endomorphism if $M$ is abelian, which is a neat little proof in itself!). Now our original idea works: we can define a “ring action” of a ring $R$ on an abelian group $M$ as a ring homomorphism $R \to End (M)$ .
It turns out that this is equivalent to our earlier definition of a module: our “ring action” of $R$ on $M$ is literally just the definition of an $R$ -module $M$ . We got to the definition of a module just by taking the definition of a group action and generalizing to rings in the most straightforward possible way.
To make this more concrete, let’s look at a simple illustrative example, which connects modules directly to linear algebra.
- Let our ring be $R = C [x]$ , the ring of polynomials with complex coefficients.
- Let our abelian group be $M = C^{2}$ , the standard 2D plane.
A module structure is a ring homomorphism $ϕ : C [x] \to End (C^{2})$ . The ring $End (C^{2})$ is just the ring of 2x2 complex matrices, $M_{2} (C)$ .
Now, a homomorphism out of a polynomial ring like $C [x]$ is completely determined by one thing: where you send the variable $x$ . So, to define a whole module structure, all we have to do is choose a single 2x2 matrix $T$ to be the image of $x$ .
$T = ϕ (x)$
The action of any other polynomial, say $p (x) = a_{n} x^{n} + \dots + a_{0}$ , on a vector $v$ is then automatically defined: $p (x) \cdot v = (a_{n} T^{n} + \dots + a_{0} I) v$ . In short, the study of $C [x]$ -modules on $C^{2}$ is literally the same thing as the study of 2x2 matrices.
Thinking this way immediately clears up some confusing “module” properties. For instance, in a vector space, if a scalar $c$ times a vector $v$ is zero ( $c \cdot v = 0$ ), then either $c = 0$ or $v = 0$ . This fails for modules. In our $C [x]$ -module defined by the matrix $T = (\begin{matrix} 0 & 1 0 & 0 \end{matrix})$ , the polynomial $x^{2}$ is not zero, but for any vector $v$ , $x^{2} \cdot v = T^{2} v = 0 v = 0$ . This phenomenon, called torsion, feels strange from a vector space view, but from the “ring action” view, it’s natural: some of our operators (like $T^{2}$ ) can just happen to be the zero transformation.
Why are modules so deep?
Seeing modules as “ring actions” starts to explain why you should care about them, but it still doesn’t explain why the theory of modules is so deep. After all, if we compare to group actions, the theory of group actions is in some sense just the theory of groups—there’s really not much to say about a faithful group action beyond the underlying group itself. Group theorists don’t spend much time studying group actions in their own right—why do commutative algebraists seem so completely obsessed with modules? Why are modules more interesting?
The short answer is that the theory of modules is far richer: even when it comes to “faithful” modules, there may be multiple, non-isomorphic modules for the same ring $R$ and abelian group $M$ . In other words, unlike group actions, the ways in which a given ring can manifest concretely are not unique, even when acting faithfully on the same object.
To show this concretely, let’s return to our $C [x]$ module example. Remember that a $C [x]$ -module with structure $ϕ : C [x] \to End (C^{2})$ is determined just by $ϕ (x)$ , where we choose to send the element $x$ . We can define two different modules here which are not isomorphic:
- First, suppose $ϕ (x) = T_{1} = (\begin{matrix} 2 & 0 0 & 3 \end{matrix})$ . In this module, the action of the polynomial $x$ on a vector $v = (v_{1}, v_{2})$ is: $x \cdot v = T_{1} v = (2 v_{1}, 3 v_{2})$ . The action of $x$ is to scale the first coordinate by 2 and the second by 3.
- Alternatively, suppose $ϕ (x) = T_{2} = (\begin{matrix} 0 & 1 0 & 0 \end{matrix})$ . In this module, the action of $x$ on a vector $v = (v_{1}, v_{2})$ is: $x \cdot v = T_{2} v = (v_{2}, 0)$ . The action of $x$ sends the vector to $(v_{2}, 0)$ . The action of $x^{2}$ sends everything to the zero vector because $T_{2}^{2}$ is the zero matrix.
These are radically different modules.
- In module $M_{1}$ (defined by $T_{1}$ ), the action of $x$ is diagonalizable. The module can be broken down into two simple submodules (the x-axis and the y-axis). This is a semisimple module.
- In module $M_{2}$ (defined by $T_{2}$ ), the action of $x$ is not diagonalizable. There is a submodule (the x-axis, spanned by (1,0)) but it has no complementary submodule. $M_{2}$ is indecomposable but not simple.
The underlying abelian group $C^{2}$ is the same in both cases. The ring $C [x]$ is the same. But the “action”—which is determined by our choice of the matrix $ϕ (x)$ - creates completely different algebraic objects.
The analogy: a group is a person, a ring is an actor
This was quite a perspective shift for me, and something I had trouble thinking about intuitively. A metaphor I found helpful is to think of it like this: a group is like a person, while a ring is like an actor.
- A group is a person: A group $G$ has a fixed identity. Its essence is its internal structure of symmetries. When a group $G$ acts on a set $X$ , it’s like a person interacting with their environment. The person doesn’t change; the action simply reveals their inherent nature. If the action is trivial, it means that person has no effect on that particular environment. If it’s faithful, the environment perfectly reflects the person’s structure. But the person remains the same entity. The question is: “What does this person do to this set?”
- A ring is an actor: A ring $R$ is more like a professional actor. An actor has an identity, skills, and a range (the ring’s internal structure). But their entire purpose is to manifest as different characters in different plays.
  - The module $M$ is the performance.
  - The underlying abelian group of $M$ is the stage.
  - The specific module structure (the homomorphism $ϕ : R \to End (M)$ ) is the role or the script the actor is performing.
Using our $C [x]$ example: The actor is the polynomial ring, $C [x]$ . This actor has immense range.
- Performance 1: The actor is cast in the role $T_{1} = (\begin{matrix} 2 & 0 0 & 3 \end{matrix})$ . The performance is straightforward, stable, and easily understood (semisimple).
- Performance 2: The same actor is cast in the role $T_{2} = (\begin{matrix} 0 & 1 0 & 0 \end{matrix})$ . The performance is a complex tragedy. The character’s actions are subtle, entangled, and ultimately lead to nothingness (nilpotent).
The “actor” is the same. The “stage” is the same. But the “performances” are fundamentally different. To understand the actor $C [x]$ , you cannot just watch one performance. You gain a more complete appreciation for their range, subtlety, and power by seeing all the different roles they can inhabit.
Why can rings manifest in multiple ways?
It’s worth pausing for a moment to notice the suprising fact that modules can have this sort of intricate structure, when group actions don’t. What is it about rings that lets them act in so many different “roles”, in a way that groups can’t?
In short, the ring axioms force the action of a ring’s elements to behave like linear transformations rather than simple permutations.
- A group only has one operation. Its action only needs to preserve that one operation: $(g_{1} g_{2}) \cdot x = g_{1} \cdot (g_{2} \cdot x)$ . It’s a rigid structural mapping.
- A ring has two operations ( $+$ and $\cdot$ ) that are linked by distributivity. A ring action (module) must preserve both. It must be a homomorphism of rings. This extra constraint (linearity with respect to addition) is what paradoxically opens the door to variety. It demands that the ring’s elements behave not just like permutations, but like linear transformations. And there are many, many ways to represent a system of abstract operators as concrete linear transformations on a space. The vast gap between group actions and group representations can be seen as a microcosm of this idea.
The object is its category of actions
Part of the reason why I like this metaphor is because it points towards a deeper understanding of how we should think about rings in the first place. This leads to, from my position, one of the most fascinating ideas in modern algebra:
You do not truly understand a ring $R$ by looking at $R$ itself. You understand $R$ by studying the category of all possible $R$ -modules.
The “identity” of the ring is not its list of elements, but the totality of all the ways it can manifest—all the performances it can give.
- A simple ring like a field $k$ is a “typecast actor.” Every performance (every vector space) looks basically the same (it’s determined by one number: the dimension).
- A complex ring like $C [x]$ or the ring of differential operators is a “virtuoso character actor.” It has an incredibly rich and varied body of work, and its true nature is only revealed by studying the patterns and structures across all of its possible roles (the classification of its modules).
So, why care?
Alright, so hopefully some of that makes sense. We started with the (in my opinion) unhelpful idea of a module as a “vector space over a ring” and saw that it’s much more illuminating to see it as a “ring action”—the way a ring shows up in the world. The story got deeper when we realized that, unlike groups, rings can show up in many fundamentally different ways, analogous to an actor playing a vast range of roles.
So where does this leave us? For me, at least, modules don’t feel as alien anymore. At the risk of being overly glib, the reason modules seem to show up everywhere is because they are everywhere. They are the language we use to talk about the manifestations of rings. When a property of a geometric space comes down to a property of a module, it’s because the ring of functions on that space is performing one of its many roles, and the module is that performance.
There’s further you could take this (structure of modules over PIDs, homological algebra, etc), but this seems like a good place to stop for now. Hopefully this was helpful!
1. ^
  In fact, modules literally generalize group representation theory!
2. ^
  Sometimes in a quite literal way. In quantum mechanics, the ring is an algebra of observables, and a specific physical system is defined by choosing a specific operator for the Hamiltonian, $H$ . This is directly analogous to how we chose a specific matrix $T$ for $x$ to define our module, as in the example of $C [x]$ -modules we study later.

Zach Furman 8 May 2025 8:43 UTC
10 points
6
in reply to: Thomas Kwa’s comment on: Zach Furman’s Shortform
Yeah, I completely agree this is a good research direction! My only caveat is I don’t think this is a silver bullet in the same way capabilities benchmarks are (not sure if you’re arguing this, just explaining my position here). The inevitable problem with interpretability benchmarks (which to be clear, your paper appears to make a serious effort to address) is that you either:
1. Train the model in a realistic way—but then you don’t know if the model really learned the algorithm you expected it to
2. Train the model to force it to learn a particular algorithm—but them you have to worry about how realistic that training method was or the algorithm you forced it to learn
This doesn’t seem like an unsolvable problem to me, but it does mean that (unlike capabilities benchmarks) you can’t have the level of trust in your benchmarks that allow you to bypass traditional scientific hygiene. In other words, there are enough subtle strings attached here that you can’t just naively try to “make number go up” on InterpBench in the same way you can with MMLU.
I probably should emphasize I think trying to bring interpretability into contact with various “ground truth” settings seems like a really high value research direction, whether that be via modified training methods, toy models where possible algorithms are fairly simple (e.g. modular addition), etc. I just don’t think it changes my point about methodological standards.

Zach Furman 7 May 2025 8:47 UTC
76 points
33
on: Zach Furman’s Shortform
Understanding deep learning isn’t a leaderboard sport—handle with care.
Saliency maps, neuron dissection, sparse autoencoders—each surged on hype, then stalled^[1] when follow‑up work showed the insight was mostly noise, easily spoofed, or valid only in cherry‑picked settings. That risks being negative progress: we spend cycles debunking ghosts instead of building cumulative understanding.
The root mismatch is methodological. Mainstream ML capabilities research enjoys a scientific luxury almost no other field gets: public, quantitative benchmarks that tie effort to ground truth. ImageNet accuracy, MMLU, SWE‑bench—one number silently kills bad ideas. With that safety net, you can iterate fast on weak statistics and still converge on something useful. Mechanistic interpretability has no scoreboard for “the network’s internals now make sense.” Implicitly inheriting benchmark‑reliant habits from mainstream ML therefore swaps a ruthless filter for a fog of self‑deception.
How easy is it to fool ourselves? Recall the “Could a neuroscientist understand a microprocessor?” study: standard neuroscience toolkits—ablation tests, tuning curves, dimensionality reduction—were applied to a 6502 chip whose ground truth is fully known. The analyses produced plausible‑looking stories that entirely missed how the processor works. Interpretability faces the same trap: shapely clusters or sharp heat‑maps can look profound until a stronger test dissolves them.
What methodological standard should replace the leaderboard? Reasonable researchers will disagree^[2]. Borrowing from mature natural sciences like physics or neuroscience seems like a sensible default, but a proper discussion is beyond this note. The narrow claim is simpler:
Because no external benchmark will catch your mistakes, you must design your own guardrails. Methodology for understanding deep learning is an open problem, not a hand‑me‑down from capabilities work.
So, before shipping the next clever probe, pause and ask: Where could I be fooling myself, and what concrete test would reveal it? If you don’t have a clear answer, you may be sprinting without the safety net this methodology assumes—and that’s precisely when caution matters most.
1. ^
  This is perhaps a bit harsh—I think SAEs for instance still might hold some promise, and neuron-based analysis still has its place, but I think it’s fair to say the hype got quite ahead of itself.
2. ^
  Exactly how much methodological caution is warranted here will obviously be a point of contention. Everyone thinks the people going faster than them are reckless and the people going slower are needlessly worried. My point here is just to think actively about the question—don’t just blindly inherit standards from ML.

Zach Furman 5 May 2025 4:59 UTC
3 points
0
in reply to: Jeremy Gillen’s comment on: RA x ControlAI video: What if AI just keeps getting smarter?
It’s not relevant to predictions about how well the learning algorithm generalises. And that’s the vastly more important factor for general capabilities.

Quite tangential to your point, but the problem with the universal approximation theorem is not just “it doesn’t address generalization” but that it doesn’t even fulfill its stated purpose: it doesn’t answer the question of why neural networks can space-efficiently approximate real-world functions, even with arbitrarily many training samples. The statement “given arbitrary resources, a neural network can approximate any function” is actually kind of trivial—it’s true not only of polynomials, sinusoids, etc, but even just a literal interpolated lookup table (if you have an astronomical space budget). It turns out the universal approximation theorem requires exponentially many neurons (in the size of the input dimension) to work, far too much to be practical—in fact this is actually the same amount of resources a lookup table would cost. This is fine if you want to approximate a 2D function or something, but this goes nowhere to explaining why, like, even a space-efficient MNIST classifier is possible. The interesting question is, why can neural networks efficiently approximate the functions we see in practice?

(It’s a bit out of scope to fully dig into this, but I think a more sensible answer is something in the direction of “well, anything you can do efficiently on a computer, you can do efficiently in a neural network”—i.e. you can always encode polynomial-size Boolean circuits into a polynomial-size neural network. Though there are some subtleties here that make this a little more complicated than that.)

Zach Furman 17 Mar 2025 2:15 UTC
6 points
3
in reply to: rpglover64’s comment on: Interpreting Complexity
After convergence, the samples should be viewed as drawn from the stationary distribution, and ideally they have low autocorrelation, so it doesn’t seem to make sense to treat them as a vector, since there should be many equivalent traces.

This is a very subtle point theoretically, so I’m glad you highlighted this. Max may be able to give you a better answer here, but I’ll try my best to attempt one myself.

I think you may be (understandably) confused about a key aspect of the approach. The analysis isn’t focused on autocorrelation within individual traces, but rather correlations between different input traces evaluated on the same parameter samples from SGLD.
What this approach is actually doing, from a theoretical perspective, is estimating the expected correlation of per-datapoint losses over the posterior distribution of parameters. SGLD serves purely as a mechanism for sampling from this posterior (as it ordinarily does).
When examining correlations between $T (x_{1})_{t}$ and $T (x_{2})_{t}$ across different inputs $x_{1}, x_{2}$ but identical parameter samples $θ_{t}$ , the method approximates the posterior expectation
$E_{θ \sim p (θ | D)} [(L (θ | x_{1}) - L (θ_{0} | x_{1})) (L (θ | x_{2}) - L (θ_{0} | x_{2}))]$ .
Assuming that SGLD is taking unbiased IID samples from the posterior, the dot product of traces $T (x_{1})_{t} \cdot T (x_{2})_{t}$ is an unbiased estimator of this correlation. The vectorization of traces is therefore an efficient mechanism for computing these expectations in parallel, and representing the correlation structure geometrically.
At the risk of being repetitive, at an intuitive level, this is designed to detect when different inputs respond similarly to the same parameter perturbations. When inputs $x_{1}$ and $x_{2}$ share functional circuitry within the model, they’ll likely show correlated loss patterns when that shared circuitry is disrupted at parameter $θ_{t}$ . When two trace vectors $T (x_{1})_{t}, T (x_{2})_{t}$ are orthogonal, that means the per-sample losses for $x_{1}$ and $x_{2}$ are uncorrelated across the posterior. The presence or absence of synchronization reveals functional similarities between inputs that might not be apparent through other means.

What all this means is that we do in fact want to use SGLD for plain old MCMC sampling—we are genuinely attempting to use random samples from the posterior rather than e.g. examining the temporal behavior of SGLD. Ideally, we want all the usual desiderata of MCMC sampling algorithms here, like convergence, unbiasedness, low autocorrelation, etc. You’re completely correct that if SGLD is properly converged and sampling ideally, it has fairly boring autocorrelation within a trace—but this is exactly what we want.

Zach Furman 13 Dec 2024 10:01 UTC
3 points
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Stan van Wingerden’s Shortform
IIRC @jake_mendel and @Kaarel have thought about this more, but my rough recollection is: a simple story about the regularization seems sufficient to explain the training dynamics, so a fancier SLT story isn’t obviously necessary. My guess is that there’s probably something interesting you could say using SLT, but nothing that simpler arguments about the regularization wouldn’t tell you also. But I haven’t thought about this enough.

Zach Furman 24 Sep 2024 6:08 UTC
1 point
0
in reply to: JamesH’s comment on: Singular learning theory: exercises
Good catch, thanks! Fixed now.

Zach Furman 4 Dec 2023 18:46 UTC
3 points
0
in reply to: Noosphere89’s comment on: Generalization, from thermodynamics to statistical physics
It’s worth noting that Jesse is mostly following the traditional “approximation, generalization, optimization” error decomposition from learning theory here—where “generalization” specifically refers to finite-sample generalization (gap between train/test loss), rather than something like OOD generalization. So e.g. a failure of transformers to solve recursive problems would be a failure of approximation, rather than a failure of generalization. Unless I misunderstood you?

Zach Furman 1 Dec 2023 18:06 UTC
LW: 3 AF: 1
0
AF
on: Generalization, from thermodynamics to statistical physics
Repeating a question I asked Jesse earlier, since others might be interested in the answer: how come we tend to hear more about PAC bounds than MAC bounds?

Zach Furman 2 Nov 2023 3:15 UTC
2 points
1
in reply to: DanielFilan’s comment on: Singular learning theory and bridging from ML to brain emulations
Note that in the SLT setting, “brains” or “neural networks” are not the sorts of things that can be singular (or really, have a certain $λ$ ) on their own—instead they’re singular for certain distributions of data.
This is a good point I often see neglected. Though there’s some sense in which a model $p (x | w)$ can “be singular” independent of data: if the parameter-to-function map $w \mapsto p (x | w)$ is not locally injective. Then, if a distribution $p (x)$ minimizes the loss, the preimage of $p (x)$ in parameter space can have non-trivial geometry.
These are called “degeneracies,” and they can be understood for a particular model without talking about data. Though the actual $p (x)$ that minimizes the loss is determined by data, so it’s sort of like the “menu” of degeneracies are data-independent, and the data “selects one off the menu.” Degeneracies imply singularities, but not necessarily vice-versa, so they aren’t everything. But we do think that degeneracies will be fairly important in practice.

Zach Furman 22 Oct 2023 15:51 UTC
10 points
5
in reply to: wolajacy’s comment on: TOMORROW: the largest AI Safety protest ever!
A possible counterpoint, that you are mostly advocating for awareness as opssosed to specific points is null, since pretty much everyone is aware of the problem now—both society as a whole, policymakers in particular, and people in AI research and alignment.
I think this specific point is false, especially outside of tech circles. My experience has been that while people are concerned about AI in general, and very open to X-risk when they hear about it, there is zero awareness of X-risk beyond popular fiction. It’s possible that my sample isn’t representative here, but I would expect that to swing in the other direction, given that the folks I interact with are often well-educated New-York-Times-reading types, who are going to be more informed than average.

Even among those aware, there’s also a difference between far-mode “awareness” in the sense of X-risk as some far away academic problem, and near-mode “awareness” in the sense of “oh shit, maybe this could actually impact me.” Hearing a bunch of academic arguments, but never seeing anybody actually getting fired up or protesting, will implicitly cause people to put X-risk in the first bucket. Because if they personally believed it to be big a near-term risk, they’d certainly be angry and protesting, and if other people aren’t, that’s a signal other people don’t really take it seriously. People sense a missing mood here and update on it.

Zach Furman 16 Oct 2023 22:43 UTC
18 points
13
in reply to: iceman’s comment on: Arguments for radical optimism on AI Alignment
In the cybersecurity analogy, it seems like there are two distinct scenarios being conflated here:
1) Person A says to Person B, “I think your software has X vulnerability in it.” Person B says, “This is a highly specific scenario, and I suspect you don’t have enough evidence to come to that conclusion. In a world where X vulnerability exists, you should be able to come up with a proof-of-concept, so do that and come back to me.”
2) Person B says to Person A, “Given XYZ reasoning, my software almost certainly has no critical vulnerabilities of any kind. I’m so confident, I give it a 99.99999%+ chance.” Person A says, “I can’t specify the exact vulnerability your software might have without it in front of me, but I’m fairly sure this confidence is unwarranted. In general it’s easy to underestimate how your security story can fail under adversarial pressure. If you want, I could name X hypothetical vulnerability, but this isn’t because I think X will actually be the vulnerability, I’m just trying to be illustrative.”
Story 1 seems to be the case where “POC or GTFO” is justified. Story 2 seems to be the case where “security mindset” is justified.
It’s very different to suppose a particular vulnerability exists (not just as an example, but as the scenario that will happen), than it is to suppose that some vulnerability exists. Of course in practice someone simply saying “your code probably has vulnerabilities,” while true, isn’t very helpful, so you may still want to say “POC or GTFO”—but this isn’t because you think they’re wrong, it’s because they haven’t given you any new information.
Curious what others have to say, but it seems to me like this post is more analogous to story 2 than story 1.

Zach Furman 9 Oct 2023 23:28 UTC
3 points
2
in reply to: zrezzed’s comment on: Thomas Kwa’s MIRI research experience
I wish I had a more short-form reference here, but for anyone who wants to learn more about this, Rocket Propulsion Elements is the gold standard intro textbook. We used in my university rocketry group, and it’s a common reference to see in industry. Fairly well written, and you should only need to know high school physics and calculus.

Zach Furman 28 Sep 2023 23:15 UTC
1 point
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform
Obviously this is all speculation but maybe I’m saying that the universal approximation theorem implies that neural architectures are fractal in space of all distributtions (or some restricted subset thereof)?

Oh I actually don’t think this is speculation, if (big if) you satisfy the conditions for universal approximation then this is just true (specifically that the image of $W$ is dense in function space). Like, for example, you can state Stone-Weierstrass as: for a Hausdorff space X, and the continuous functions under the sup norm $C (X, R)$ , the Banach subalgebra of polynomials is dense in $C (X, R)$ . In practice you’d only have a finite-dimensional subset of the polynomials, so this obviously can’t hold exactly, but as you increase the size of the polynomials, they’ll be more space-filling and the error bound will decrease.
Curious what’s your beef with universal approximation? Stone-weierstrass isn’t quantitative—is that the reason?
The problem is that the dimension of $W$ required to achieve a given $ϵ$ error bound grows exponentially with the dimension $d$ of your underlying space $X$ . For instance, if you assume that weights depend continuously on the target function, $ϵ$ -approximating all $C^{n}$ functions on $[0, 1]^{d}$ with Sobolev norm $\leq 1$ provably takes at least $O (ϵ^{- d / n})$ parameters (DeVore et al.). This is a lower bound.
So for any realistic $d$ universal approximation is basically useless—the number of parameters required is enormous. Which makes sense because approximation by basis functions is basically the continuous version of a lookup table.
Because neural networks actually work in practice, without requiring exponentially many parameters, this also tells you that the space of realistic target functions can’t just be some generic function space (even with smoothness conditions), it has to have some non-generic properties to escape the lower bound.

Zach Furman 28 Sep 2023 21:42 UTC
1 point
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform
Sorry, I realized that you’re mostly talking about the space of true distributions and I was mainly talking about the “data manifold” (related to the structure of the map $x \mapsto p (x ∣ w^{*})$ for fixed $w^{*}$ ). You can disregard most of that.
Though, even in the case where we’re talking about the space of true distributions, I’m still not convinced that the image of $W$ under $p (x ∣ w)$ needs to be fractal. Like, a space-filling assumption sounds to me like basically a universal approximation argument—you’re assuming that the image of $W$ densely (or almost densely) fills the space of all probability distributions of a given dimension. But of course we know that universal approximation is problematic and can’t explain what neural nets are actually doing for realistic data.

Zach Furman 28 Sep 2023 20:11 UTC
1 point
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform
Very interesting, glad to see this written up! Not sure I totally agree that it’s necessary for $W$ to be a fractal? But I do think you’re onto something.

In particular you say that “there are points $y$ in the larger dimensional space that are very (even arbitrarily) far from $W$ ,” but in the case of GPT-4 the input space is discrete, and even in the case of e.g. vision models the input space is compact. So the distance must be bounded.
Plus if you e.g. sample a random image, you’ll find there’s usually a finite distance you need to travel in the input space (in L1, L2, etc) until you get something that’s human interpretable (i.e. lies on the data manifold). So that would point against the data manifold being dense in the input space.
But there is something here, I think. The distance usually isn’t that large until you reach a human interpretable image, and it’s quite easy to perturb images slightly to have completely different interpretations (both to humans and ML systems). A fairly smooth data manifold wouldn’t do this. So my guess is that the data “manifold” is in fact not a manifold globally, but instead has many self-intersections and is singular. That would let it be close to large portions of input space without being literally dense in it. This also makes sense from an SLT perspective. And IIRC there’s some empirical evidence that the dimension of the data “manifold” is not globally constant.

Zach Furman 21 Aug 2023 17:35 UTC
1 point
0
in reply to: interstice’s comment on: Against Almost Every Theory of Impact of Interpretability
if the distribution of intermediate neurons shifts so that Othello-board-state-detectors have a reasonably high probability of being instantiated
Yeah, this “if” was the part I was claiming permutation invariance causes problems for—that identically distributed neurons probably couldn’t express something as complicated as a board-state-detector. As soon as that’s true (plus assuming the board-state-detector is implemented linearly), agreed, you can recover it with a linear probe regardless of permutation-invariance.
This is a more reasonable objection(although actually, I’m not sure if independence does hold in the tensor programs framework—probably?)
I probably should’ve just gone with that one, since the independence barrier is the one I usually think about, and harder to get around (related to non-free-field theories, perturbation theory, etc).
My impression from reading through one of the tensor program papers a while back was that it still makes the IID assumption, but there could be some subtlety about that I missed.

Zach Furman 21 Aug 2023 6:43 UTC
1 point
0
in reply to: interstice’s comment on: Against Almost Every Theory of Impact of Interpretability
The reason the Othello result is surprising to the NTK is that neurons implementing an “Othello board state detector” would be vanishingly rare in the initial distribution, and the NTK thinks that the neuron function distribution does not change during training.
Yeah, that’s probably the best way to explain why this is surprising from the NTK perspective. I was trying to include mean-field and tensor programs as well (where that explanation doesn’t work anymore).
As an example, imagine that our input space consisted of five pixels, and at initialization neurons were randomly sensitive to one of the pixels. You would easily be able to construct linear probes sensitive to individual pixels even though the distribution over neurons is invariant over all the pixels.
Yeah, this is a good point. What I meant to specify wasn’t that you can’t recover any permutation-sensitive data at all (trivially, you can recover data about the input), but that any learned structures must be invariant to neuron permutation. (Though I’m feeling sketchy about the details of this claim). For the case of NTK, this is sort of trivial, since (as you pointed out) it doesn’t really learn features anyway.
By the way, there are actually two separate problems that come from the IID assumption: the “independent” part, and the “identically-distributed” part. For space I only really mentioned the second one. But even if you deal with the identically distributed assumption, the independence assumption still causes problems.This prevents a lot of structure from being representable—for example, a layer where “at most two neurons are activated on any input from some set” can’t be represented with independently distributed neurons. More generally a lot of circuit-style constructions require this joint structure. IMO this is actually the more fundamental limitation, though takes longer to dig into.

Zach Furman 19 Aug 2023 21:31 UTC
1 point
−2
in reply to: kave’s comment on: Against Almost Every Theory of Impact of Interpretability
I think the core surprising thing is the fact that the model learns a representation of the board state. The causal / linear probe parts are there to ensure that you’ve defined “learns a representation of the board state” correctly—otherwise the probe could just be computing the board state itself, without that knowledge being used in the original model.
This is surprising to some older theories like statistical learning, because the model is usually treated as effectively a black box function approximator. It’s also surprising to theories like NTK, mean-field, and tensor programs, because they view model activations as IID samples from a single-neuron probability distribution—but you can’t reconstruct the board state via a permutation-invariant linear probe. The question of “which neuron is which” actually matters, so this form of feature learning is beyond them. (Though there may be e.g. perturbative modifications to these theories to allow this in a limited way).

Zach Furman 18 Aug 2023 23:39 UTC
2 points
0
in reply to: kave’s comment on: Against Almost Every Theory of Impact of Interpretability
Yeah, that was what I was referring to. Maybe “algorithmic model” isn’t the most precise—what we know is that the NN has an internal model of the board state that’s causal (i.e. the NN actually uses it to make predictions, as verified by interventions). Theoretically it could just be forming this internal model via a big lookup table / function approximation, rather than via a more sophisticated algorithm. Though we’ve seen from modular addition work, transformer induction heads, etc that at least some of the time NNs learn genuine algorithms.

Zach Furman

The typical motivation for modules: “vector spaces but with rings”

A better motivation for modules: “ring actions”

Why are modules so deep?

The analogy: a group is a person, a ring is an actor

Why can rings manifest in multiple ways?

The object is its category of actions

So, why care?