Very cool and well-presented—thanks for taking the time to write this down. I thought about this question at some point and ended up deciding that the compressed sensing picture isn’t very well shaped for this, but didn’t have a complete argument for this—it’s nice to have confirmation
Dmitry Vaintrob
On the friendship fallacy and Owen Barfield
I just finished reading the book “The Fellowship: The Literary Lives of the Inklings”, by Philip and Carol Zaleski. It’s a book about an intellectually appealing and socially cohesive group of writers in Oxford who met weekly and critiqued each other’s work, which included JRR Tolkien and CS Lewis. The book is very centered on Christianity (the writers also write Christian apologetics), but this works well, as understanding either Lewis or Tolkien or the Inklings in general without the lens of their deeply held thoughtful Christianity is about as silly as trying to analyze the Lion King without reading Hamlet.
But there is a core character in the book who is treated sympathetically and who I really hate: Owen Barfield, the “founding” Inkling. From his youth, he is a follower of Rudolf Steiner and a devoted Anthroposophist (a particularly benign group of Christian Occultists). Barfield was Lewis’s friend, existing always in his shadow (Lewis was very famous in his lifetime as a philosopher and Christian apologist, a kind of Jordan Peterson of his time if you imagine Jordan Peterson had brains and real literary/academic credentials). He worked in a law firm and consistently saw himself as a thwarted philosopher/writer/poet, and he found recognition late in life after he wrote a Lewis biography and after his woo-adjacent ideas became more popular in the 60s.
Throughout his life, Barfield created a personal philosophy of “all the things I like/ think are interesting are kind of the same thing”, and he was very sad when people he liked disapproved of, or failed to identify as “sort of the same thing” the different things that he mixed into his philosophy. While he generally is a bit of an “intellectual klutz”, his fundamental failure is the “Friendship Fallacy”: the idea of treating ideas as friends, as something deserving of loyalty. When he encounters different ideas he likes, he “wants them to get along,” and when ideas fail to convince skeptics or produce results or interface with reality (or indeed, with faith), he simply fails to impose any kind of falsifiability requirement and treats this as a loyalty test he must pass. He totally lacks the kind of internal courage needed to kill one’s darlings (whether philosophical or literary) and to treat his own ideas with skepticism and view towards falsification—perhaps the core trait of a good thinker (Feynman’s “You must not fool yourself—and you are the easiest person to fool”).
Interestingly, I don’t extend this antipathy to the Christianity of the group’s other famous members. Unlike Barfield, Tolkien and Chesterton largely succeed (imo) in separating the domains of the literary, the psychological, and the religious. They don’t pretend to be scientific authorities or predict things “in the world”. Tolkien in particular is very anti-progress and a bit of a luddite, but in my understanding his work as a linguist is very good for his time. In fact, it’s funny that his deeply Christian mentality created one of the most “atheist nerd”-like behaviors of creating thoroughly crafted fictional languages of fantasy cultures. I’ve been surprised to learn from reading a couple of his biographies that his linguistic worldbuilding in fact preceded his fantasy work: he designed Elvish before writing any work in his canon, and wrote the work to flesh out the mythology behind expressions and poems. He famously said about his work “The making of language and mythology are related functions”. In fact, he viewed the work of producing plausible cultures and languages—in my view an admirable (though non-academic) kind of secular scholarship analogous to studying alternative physical systems, etc. -- as an explicitly Christian task of “subcreation”, a sort of worship-by-imitation of God.
It’s a bit hard to exactly formulate a razor between the kind of “lazy scientism” of Barfield and various other forms of “pseudoscientific woo” and the serious and purely mystical/ inspirational deep religiosity of people like Tolkien (and to a lesser extent Lewis—another interesting thing I learned was that he started out as a devoted atheist in a world where this was actually socially fraught, and was converted through a philosophical struggle involving Barfield and Tolkien in particular). But maybe the idea of a “philosophy without struggle”: a tendency towards confirmation and a total lack of earnest self-questioning goes a part of the way towards explaining this distinction. Another part is the difference between a purely metaphysical personal religion and a more woo idea of a religion that “makes predictions about the world”. I think the thing that really took me aback a bit was the level of academic embrace of Barfield late in his life, not just as a Lewis biographer but as a respected academic philosopher with honorary professorships and the works—a confirmation (if ever more are needed) that lazy pseudointellectualism and confirmation bias are very much not incompatible with academic success. Another theme that I think is interesting is the fact that Lewis and Tolkien were at times genuinely interested and even somewhat inspired by his ideas (though they had no time for occultism or 60-esque woo). The extent to which this happened is hard to gauge (he outlived them and wrote a lot about how he influenced them in his biographies/reminiscences, and this was then picked up by scholars). But unquestioningly, this did occur to some extent. And whether or not you class Tolkien/Lewis as “valuable thinkers”, the history of science and philosophy does seem to abound with examples of clear and robust thinkers whose good ideas were to some extent inspired by charismatic charlatans and woo.
Below are my personal notes on Barfield that I wrote after reading the book.
I despise Barfield. Not in the visceral sense that the first syllable of his name may (Anthroposophically) evoke. Indeed I identify with the underdog/late-bloomer shape of his biography, with his striving towards a higher calling. I readily adopt the book’s sympathy towards him as a literary character with fortunes tied to an idea deeply espoused, a thwarted writer with some modicum of undiscovered talent. My antipathy isn’t even in the specifics of what he espouses: a mild but virulently wrong view of science and philosophy adjacent to all the stupid of my parents’ generation of `anthroposophy’ (Atlantis, Consciousness and Quantum Mechanics, anti-Evolutionism, Vibes). But I despise him as one of a Fundamental Mistake. That of confusing science and personality. Being loyal to a scientific or philosophical discipline isn’t like being loyal to a person: if it’s consistently fucking up and you need to make excuses for its behavior to all your reasonable smart friends, you’re not being a good friend but rather a bad scientist. Barfield is almost an archetype of Bad Science if you project out the crazy/dogmatic/ political/ evil-Nazi component. He really is a nice man. But within his mild-mannered Christian friendliness which I respect, he is inflexible and unscientific. He doesn’t update. He glows when people endorse his preferred view (Anthroposophy and Steiner) and sadly laments when they disagree with him—because he can’t help but feel like ``there’s something there″. He wants to seamlessly draw parallels between all the nice things he and other nice people believe. He draws lines of identification back and forth between all the things he likes (Coleridge <> Himself <> Quantum Mechanics <> Anthroposophy <> Steiner <> Religion <> Consciousness <> Complementary dualism/”polarity”). He has “nothing but symbols” in his brain, and the symbols in his brain aren’t strong enough to notice that they fail to signify. A person without significance, with a philosophy without significance, possessed of a brain without the capacity to grasp the concept of what it means to signify. The first of these is a tragedy (people should matter) and his late-found fame mediated through famous friends is a sweet story, maybe one he even deserves as the first-mover of the Inklings, the reason for the Lewis-Tolkien friendship, etc. The second is a neutral: theories that fail to achieve significance “in their lifetime” may be bunk but may have value: Greek Atomism, various prescient ideas about physics and computers (Babbage/ Lovelace), etc. But the third is a profound personal failing, and it’s only through luck and through (mostly well-placed) trust in much smarter and more rigorous friends that he avoided attaching this vapid form of mentation to something truly vile: Nazism (which he very briefly flirted with, charmed by its interest in magic and the occult), various fundamentalisms (including an anti-evolutionary fundamentalism: his friends believed in evolution but he didn’t really buy it “on vibes”; he was never a fundamentalist), Communism, etc.
Steelmanning heuristic arguments
Thanks for this post. I would argue that part of an explanation here could also be economic: modernity brings specialization and a move from the artisan economy of objects as uncommon, expensive, multipurpose, and with a narrow user base (illuminated manuscripts, decorative furniture) to a more utilitarian and targeted economy. Early artisans need to compete for a small number of rich clients by being the most impressive, artistic, etc., whereas more modern suppliers follow more traditional laws of supply and demand and track more costs (cost-effectiveness, readability and reader’s time vs. beauty and remarkableness). And consumers similarly can decouple their needs: art as separate from furniture and architecture, poetry and drama as separate from information and literature. I think another aspect of this shift, that I’m sad we’ve lost, is the old multipurpose scientific/philosophical treatises with illustrations or poems (my favorite being de Rerum Natura, though you could argue that Nietzsche and Wagner tried to revive this with their attempts at Gesamtkunstwerke).
I’m managing to get verve and probity, but having issues with wiles
I really liked the post—I was confused by the meaning and purpose no-coincidence principle when I was a ARC, and this post clarifies it well. I like that this is asking for something that is weaker than a proof (or a probabilistic weakening of proof), as [related to the example of using the Riemann hypothesis], in general you expect from incompleteness for there to be true results that lead to “surprising” families of circuits which are not provable by logic. I can also see Paul’s point of how this statement is sort of like P vs. BPP but not quite.
More specifically, this feels like a sort of 2nd-order boolean/polynomial hierarchy statement whose first-order version is P vs. BPP. Are there analogues of this for other orders?
Memorization-generalization in practice
Efficiency spectra and “bucket of circuits” cartoons
Looks like a conspiracy of pigeons posing as lw commenters have downvoted your post
The memorization-generalization spectrum and learning coefficients
Thanks!
I haven’t grokked your loss scales explanation (the “interpretability insights” section) without reading your other post though.
Not saying anything deep here. The point is just that you might have two cartoon pictures:
every correctly classified input is either the result of a memorizing circuit or of a single coherent generalizing circuit behavior. If you remove a single generalizing circuit, your accuracy will degrade additively.
a correctly classified input is the result of a “combined” circuit consisting of multiple parallel generalizing “subprocesses” giving independent predictions, and if you remove any of these subprocesses, your accuracy will degrade multiplicatively.
A lot of ML work only thinks about picture #1 (which is the natural picture to look at if you only have one generalizing circuit and every other circuit is a memorization). But the thing I’m saying is that picture #2 also occurs, and in some sense is “the info-theoretic default” (though both occur simultaneously—this is also related to the ideas in this post)
Thanks for the questions!
You first introduce the SLT argument that tells us which loss scale to choose (the “Watanabe scale”, derived from the Watanabe critical temperature).
Sorry, I think the context of the Watanabe scale is a bit confusing. I’m saying that in fact it’s the wrong scale to use as a “natural scale”. The Watanabe scale depends only on the number of training datapoints, and doesn’t notice any other properties of your NN or your phenomenon of interest.
Roughly, the Watanabe scale is the scale on which loss improves if you memorize a single datapoint (so memorizing improves accuracy by 1/n with n = #(training set) and in a suitable operationalization, improves loss by , and this is the Watanabe scale).
It’s used in SLT roughly because it’s the minimal temperature scale where “memorization doesn’t count as relevant”, and so relevant measurements become independent of the n-point sample. However in most interp experiments, the realistic loss reconstruction loss reconstruction is much rougher (i.e., further from optimal loss) than the 1/n scale where memorization becomes an issue (even if you conceptualize #(training set) as some small synthetic training set that you were running the experiment on).
For your second question: again, what I wrote is confusing and I really want to rewrite it more clearly later. I tried to clarify what I think you’re asking about in this shortform. Roughly, the point here is that to avoid having your results messed up by spurious behaviors, you might want to degrade as much as possible while still observing the effect of your experiment. The idea is that if you found any degradation that wasn’t explicitly designed with your experiment in mind (i.e., is natural), but where you see your experimental results hold, then you have “found a phenomenon”. The hope is that if you look at the roughest such scale, you might kill enough confounders and interactions to make your result be “clean” (or at least cleaner): so for example optimistically you might hope to explain all the loss of the degraded model at the degradation scale you chose (whereas at other scales, there are a bunch of other effects improving the loss on the dataset you’re looking at that you’re not capturing in the explanation).
The question now is when degrading, what order you want to “kill confounders” in to optimally purify the effect you’re considering. The “natural degradation” idea seems like a good place to look since it kills the “small but annoying” confounders: things like memorization, weird specific connotations of the test sentences you used for your experiment, etc. Another reasonable place to look is training checkpoints, as these correspond to killing “hard to learn” effects. Ideally you’d perform several kinds of degradation to “maximally purify” your effect. Here the “natural scales” (loss on the level Claude 1 e.g., or Bert) are much too fine for most modern experiments, and I’m envisioning something much rougher.
The intuition here comes from physics. Like if you want to study properties of a hydrogen atom that you don’t see either in water or in hydrogen gas, a natural thing to do is to heat up hydrogen gas to extreme temperatures where the molecules degrade but the atoms are still present, now in “pure” form. Of course not all phenomena can be purified in this way (some are confounded by effects both at higher and at lower temperature, etc.).
Thanks! Yes the temperature picture is the direction I’m going in. I had heard the term “rate distortion”, but didn’t realize the connection with this picture. Might have to change the language for my next post
My supervillain origin story
This seems overstated
In some sense this is the definition of the complexity of an ML algorithm; more precisely, the direct analog of complexity in information theory, which is the “entropy” or “Solomonoff complexity” measurement, is the free energy (I’m writing a distillation on this but it is a standard result). The relevant question then becomes whether the “SGLD” sampling techniques used in SLT for measuring the free energy (or technically its derivative) actually converge to reasonable values in polynomial time. This is checked pretty extensively in this paper for example.
A possibly more interesting question is whether notions of complexity in interpretations of programs agree with the inherent complexity as measured by free energy. The place I’m aware of where this is operationalized and checked is our project with Nina on modular addition: here we do have a clear understanding of the platonic complexity, and the local learning coefficient does a very good job of asymptotically capturing it with very good precision (both for memorizing and generalizing algorithms, where the complexity difference is very significant).
Citation? [for Apollo]
Look at this paper (note I haven’t read it yet). I think their LIB work is also promising (at least it separates circuits of small algorithms)
Thanks for the reference, and thanks for providing an informed point of view here. I would love to have more of a debate here, and would quite like being wrong as I like tropical geometry.
First, about your concrete question:
As I understand it, here the notion of “density of polygons’ is used as a kind of proxy for the derivative of a PL function?
Density is a proxy for the second derivative: indeed, the closer a function is to linear, the easier it is to approximate it by a linear function. I think a similar idea occurs in 3D graphics, in mesh optimization, where you can improve performance by reducing the number of cells in flatter domains (I don’t understand this field, but this is done in this paper according to some energy curvature-related energy functional). The question of “derivative change when crossing walls” seems similar. In general, glancing at the paper you sent, it looks like polyhedral currents are a locally polynomial PL generalization of currents of ordinary functions (and it seems that there is some interesting connection made to intersection theory/analogues of Chow theory, though I don’t have nearly enough background to read this part carefully). Since the purpose of PL functions in ML is to approximate some (approximately smooth, but fractally messy and stochastic) “true classification”, I don’t see why one wouldn’t just use ordinary currents here (currents on a PL manifold can be made sense of after smoothing, or in a distribution-valued sense, etc.).
In general, I think the central crux between us is whether or not this is true:
tropical geometry might be relevant ML, for the simple reason that the functions coming up in ML with ReLU activation are PL
I’m not sure I agree with this argument. The use of PL functions is by no means central to ML theory, and is an incidental aspect of early algorithms. The most efficient activation functions for most problems tend to not be ReLUs, though the question of activation functions is often somewhat moot due to the universal approximation theorem (and the fact that, in practice, at least for shallow NNs anything implementable by one reasonable activation tends to be easily implementable, with similar macroscopic properties, by any other). So the reason that PL functions come up is that they’re “good enough to approximate any function” (and also “asymptotic linearity” seems genuinely useful to avoid some explosion behaviors). But by the same token, you might expect people who think deeply about polynomial functions to be good at doing analysis because of the Stone-Weierstrass theorem.
More concretely, I think there are two core “type mismatches” between tropical geometry and the kinds of questions that appear in ML:
Algebraic geometry in general (including tropical geometry) isn’t good at dealing with deep compositions of functions, and especially approximate compositions.
(More specific to TG): the polytopes that appear in neural nets are as I explained inherently random (the typical interpretation we have of even combinatorial algorithms like modular addition is that the PL functions produce some random sharding of some polynomial function). This is a very strange thing to consider from the point of view of a tropical geometer: like as an algebraic geometer, it’s hard for me to imagine a case where “this polynomial has degree approximately 5… it might be 4 or 6, but the difference between them is small”. I simply can’t think of any behavior that is at all meaningful from an AG-like perspective where the questions of fan combinatorics and degrees of polynomials are replaced by questions of approximate equality.
I can see myself changing my view if I see some nontrivial concrete prediction or idea that tropical geometry can provide in this context. I think a “relaxed” form of this question (where I genuinely haven’t looked at the literature) is whether tropical geometry has ever been useful (either in proving something or at least in reconceptualizing something in an interesting way) in linear programming. I think if I see a convincing affirmative answer to this relaxed question, I would be a little more sympathetic here. However, the type signature here really does seem off to me.
I think it can be a problem if you recommend a book and expect the other person to have a social obligation to read it (and needs to make an effortful excuse or pay social capital if it’s not read). It might be hard to fully get rid of this, but I think the utility comparison that should be made is “social friction from someone not following a book recommendation” vs. “utility to the other person from you recommending a book based on knowledge of the book and the person’s preferences/interests”. I suspect that in most contexts this is both an EV-positive exchange and the person correctly decides not to read/finish the book. Maybe a good social norm would be to not get upset if someone doesn’t read your book rec, and also to not feel pressured to read a book that was recommended if you started it/ read a summary and decided it’s not for you