Mo Putera comments on Mo Putera’s Shortform

Mo Putera 9 Nov 2025 6:39 UTC
47 points
0
This MO thread initiated by Bill Thurston on the varied ways mathematicians think about math has always made me wonder how theoretical researchers in other fields think about their domains. I think of this as complementary to Mumford’s tribes of mathematicians, and (much more tangentially) to Eliezer’s remark on how sparse thinkers are at the intellectual frontiers.
Here are some of my favorite quotes.
Terry Tao talks about an “adversarial perspective” which I’m guessing is the closest match to how alignment researchers think:
One specific mental image that I can communicate easily with collaborators, but not always to more general audiences, is to think of quantifiers in game theoretic terms. Do we need to show that for every epsilon there exists a delta? Then imagine that you have a bag of deltas in your hand, but you can wait until your opponent (or some malicious force of nature) produces an epsilon to bother you, at which point you can reach into your bag and find the right delta to deal with the problem. Somehow, anthropomorphising the “enemy” (as well as one’s “allies”) can focus one’s thoughts quite well. This intuition also combines well with probabilistic methods, in which case in addition to you and the adversary, there is also a Random player who spits out mathematical quantities in a way that is neither maximally helpful nor maximally adverse to your cause, but just some randomly chosen quantity in between. The trick is then to harness this randomness to let you evade and confuse your adversary.
Is there a quantity in one’s PDE or dynamical system that one can bound, but not otherwise estimate very well? Then imagine that it is controlled by an adversary or by Murphy’s law, and will always push things in the most unfavorable direction for whatever you are trying to accomplish. Sometimes this will make that term “win” the game, in which case one either gives up (or starts hunting for negative results), or looks for additional ways to “tame” or “constrain” that troublesome term, for instance by exploiting some conservation law structure of the PDE.
There’s the “economic” mindset; Tao again:
Another mode of thought that I and many others use routinely, but which I realised only recently was not as ubiquitious as I believed, is to use an “economic” mindset to prove inequalities such as 𝑋≤𝑌 or 𝑋≤𝐶𝑌 for various positive quantities 𝑋,𝑌, interpreting them in the form “If I can afford Y, can I therefore afford X?” or “If I can afford lots of Y, can I therefore afford X?” respectively. This frame of reference starts one thinking about what types of quantities are “cheap” and what are “expensive”, and whether the use of various standard inequalities constitutes a “good deal” or not. It also helps one understand the role of weights, which make things more expensive when the weight is large, and cheaper when the weight is small.
Physical analogies; Tao again:
For evolutionary PDEs in particular, I find there is a rich zoo of colourful physical analogies that one can use to get a grip on a problem. I’ve used the metaphor of an egg yolk frying in a pool of oil, or a jetski riding ocean waves, to understand the behaviour of a fine-scaled or high-frequency component of a wave when under the influence of a lower frequency field, and how it exchanges mass, energy, or momentum with its environment. In one extreme case, I ended up rolling around on the floor with my eyes closed in order to understand the effect of a gauge transformation that was based on this type of interaction between different frequencies. (Incidentally, that particular gauge transformation won me a Bocher prize, once I understood how it worked.) I guess this last example is one that I would have difficulty communicating to even my closest collaborators. Needless to say, none of these analogies show up in my published papers, although I did try to convey some of them in my PDE book eventually.
Visualisation techniques; Tao again:
One visualisation technique that I have found very helpful is to incorporate the ambient symmetries of the problem (a la Klein) as little “wobbles” to the objects being visualised. This is most familiarly done in topology (“rubber sheet mathematics”), where every object considered is a bit “rubbery” and thus deforming all the time by infinitesimal homeomorphisms. But geometric objects in a scale-invariant problem could be thought of as being viewed through a camera with a slightly wobbly zoom lens, so that one’s mental image of these objects is always varying a little in size. Similarly, if one is in a translation-invariant setting, one’s mental camera should be sliding back and forth just a little to remind you of this, if one is working in a Euclidean space then the camera might be jiggling through all the rigid motions, and so forth. A more advanced example: if the problem is invariant under tensor products, as per the tensor product trick, then one’s low dimensional objects should have a tiny bit of shadowing (or perhaps look like one of these 3D images when one doesn’t have the polarised glasses, with the slightly separated red and blue components) that suggest that they are projections of a higher dimensional Cartesian product.
One reason why one wants to do this is that it helps suggest useful normalisations. If one is viewing a situation with a wobbly zoom lens and there is some length that appears all over one’s analysis, one is reminded that one can spend the scale invariance of the problem to zoom up or down as appropriate to normalise this scale to equal 1. Similarly for other ambient symmetries.
Another take on visual thinking, by François G. Dorais:
I am a visual thinker and I often try to describe what I see to my students. I’ve been known to say things like “everyone knows that HF looks like a rectangle” as I proceed to draw a rectangle on the board. (By the way, HF is the set of all hereditarily finite sets.) I find that I naturally associate different shapes with different properties of objects. Angular shapes correspond to well-defined objects whereas rounded shapes correspond to variable objects. The number of angles or curves is a measure of how complex an object is. I don’t explain my scheme to my students, but I suspect the consistency of the presentation becomes transparent over time.
I recall one instance where I deliberately concealed the true nature of my illustration to my students. I was describing a complex construction on infinite trees. I began the description by drawing five vertical lines that I promptly explained were “infinite trees viewed sideways.” It so happens that the simplest case of the construction was when the trees consisted of single branches in which case the picture was completely accurate. This is the case I secretly had in mind for the entire description but I never said that since the result was utterly trivial in that case. This was a subtle way to reduce the complex construction to the trivial case.
Benson Farb on Thurston’s visual-geometric way of thinking about higher dimensions – Thurston was widely considered the best geometric thinker in the history of math:
Being a Thurston student was inspiring and frustrating – often both at once. At our second meeting I told Bill that I had decided to work on understanding fundamental groups of negatively curved manifolds with cusps. In response I was introduced to the famous “Thurston squint”, whereby he looked at you, squint his eyes, give you a puzzled look, then gaze into the distance (still with the squint). After two minutes of this he turned to me and said: “Oh, I see, it’s like a froth of bubbles, and the bubbles have a bounded amount of interaction.”
Being a diligent graduate student, I dutifully wrote down in my notes: “Froth of bubbles. Bounded interaction.” After our meeting I ran to the library to begin work on the problem. I looked at the notes. Froth? Bubbles? Is that what he said? What does that mean? I was stuck.
Three agonizing years of work later I solved the problem. It’s a lot to explain in detail, but if I were forced to summarize my thesis in five words or less, I’d go with: “Froth of bubbles. Bounded interaction.”
A Thurston lecture would typically begin by Bill drawing a genus 4 surface, slowly erasing a hole, adding it back in, futzing with the lines, and generally delaying things while he quickly thought up the lecture he hadn’t prepared. Why did we all still attend? The answer is that once in a while we would
receive a beautiful insight that was absolutely unavailable via any other source.
… Bill was probably the best geometric thinker in the history of mathematics. Thus it came as a surprise when I found out that he had no stereoscopic vision, that is, no depth perception. Perhaps the latter was responsible somehow for the former? I once mentioned this theory to Bill. He disagreed with it, claiming that all of his skill arose from his decision, apparently as a first grader, to “practice visualizing things” every day.
… In interacting with other mathematical greats, one gets the feeling that these people are like us but just 100 (ok, 500) times better. In contrast, Thurston was a singular mind. He was an alien. There is no multiplicative factor here; Thurston was simply orthogonal to everyone. Mathematics loses a dimension with his death.
At a more elementary level, here’s Phil Issett on geometric thinking:
I feel some pressure not to convey just how often I rely on geometric modes of thought, especially when they go against the usual way of explaining things, or the background of a typical student, and are not completely necessary.
Example 1: When you row-reduce a matrix, you make a bunch of changes (most importantly some “transvections”) in the basis of the image space until a few of your basis vectors (say 𝑣1=𝑇𝑒1,𝑣2=𝑇𝑒2) span the image of the matrix 𝑇. When you picture the domain of 𝑇 foliated by level sets (which are parallel to the null space of 𝑇), you know that the remaining basis vectors 𝑒3, 𝑒4,… can be translated by some element in the span of 𝑒1, 𝑒2 (i.e. whichever one lies on the same level set) in order to obtain a basis for the null space. Now, this is how we visualize the situation, but is it how we compute and explain? Or do we just do the algebra, which at this point is quite easy? If the algebra is easy and the geometry takes a while to explain and is not “necessary” for the computation, why explain it? This is a dilemma because once algebra is sufficiently well-developed it’s possible that the necessity of (completely equivalent) geometric thinking may become more and more rare; and algebra seems to be more “robust” in that you can explore things you can’t see very well. But then, when students learn the implicit function theorem, somehow I feel like having relied on that kind of foliation much more often would help understand its geometric content. Still, even if it’s in your head and very important, are you going to draw a foliation every time you do row operations? We know the geometry, know the algebra, but it would take a while to repeatedly explain how to rely on the geometry while executing computations.
Example 2: (Things that aren’t graphs)
Another problem geometric thinking faces is that modern math often seems to regard pictures as not being proofs, even if they are more convincing, so there is a bias regarding how to choose to spend class time. Let’s say you want to differentiate x³. You can draw a cube, and a slightly larger cube, and then look at the difference of the cubes and subdivide it into a bunch of small regions, three larger slabs taking up most of the volume. Algebraically, this subdivision corresponds to multiplying out (x+h)³; collecting the terms uses the commutativity, which corresponds to rotating the various identical pieces. It is no different to write this proof out algebraically, the difference is that the algebraic one is a “proof” but the geometric one is.. not? Even if it’s more convincing. So it’s like the picture is only there for culture.
Qiaochu Yuan’s way of thinking about determinants isn’t one I’ve seen written up before:
When I talk about determinants, I generally talk about something on the spectrum between “it measures how much volume scales” and “it’s the induced action on the top exterior power.” But the way I think about determinants (especially in combinatorics) is the picture coming from the Lindstrom-Gessel-Viennot lemma: I imagine that the entries of the matrix describe transition amplitudes and that the determinant is an alternating sum over transition amplitudes in which “histories” of n particles can constructively or destructively interfere. I have a hard time making this picture precise so I rarely talk about it, but for me it gives some intuition for why determinants should be useful in combinatorics (which the elegant basis-free definition, at least for me, does not).
Edit: Let me also mention that something I really like about this perspective is that it makes intuitive not only the multiplicativity of the determinant but even the Cauchy-Binet formula.
Subconscious thought processing “masticating” tons of examples; Vivek Shende:
I have a worse problem than having unspoken thought processes: some of my best thought processes are simply beneath the level of consciousness and I don’t notice them at all until they’re finished. Even then, I often get only an answer and not an explanation out of them. Surely this happens to everyone: the problem solved during sleep, the idea on a walk in the woods, the conviction that a conjecture is true on utterly minimal evidence, the argument that pops up full formed in the middle of a conversation.
My mathematical process is roughly this: consciously, I try a lot of stupid things which essentially have no chance of working but do have the benefit of exposing me to lots of examples; these examples pile up and are subconsciously masticated for days, weeks, months—I’m not old enough mathematically to put “years” here yet—and eventually by some inner and unobservable process I just have a feeling about what to do.
Shende’s mastication remark reminds me of Michael Nielsen’s “exhaust, bad [Anki] cards that seem to be necessary to get to good cards”:
As described, this deep Ankification process can feel rather wasteful. Inevitably, over time my understanding of the proof changes. When that happens it’s often useful to rewrite (and sometimes discard or replace) cards to reflect my improved understanding. And some of the cards written along the way have the flavor of exhaust, bad cards that seem to be necessary to get to good cards. I wish I had a good way of characterizing these, but I haven’t gone through this often enough to have more than fuzzy ideas about it.
Nielsen himself has interesting remarks on how he thinks about doing math in the essay above, which is mainly about using Anki to deepen mathematical understanding:
Typically, my mathematical work begins with paper-and-pen and messing about, often in a rather ad hoc way. But over time if I really get into something my thinking starts to change. I gradually internalize the mathematical objects I’m dealing with. It becomes easier and easier to conduct (most of) my work in my head. I will go on long walks, and simply think intensively about the objects of concern. Those are no longer symbolic or verbal or visual in the conventional way, though they have some secondary aspects of this nature. Rather, the sense is somehow of working directly with the objects of concern, without any direct symbolic or verbal or visual referents. Furthermore, as my understanding of the objects change – as I learn more about their nature, and correct my own misconceptions – my sense of what I can do with the objects changes as well. It’s as though they sprout new affordances, in the language of user interface design, and I get much practice in learning to fluidly apply those affordances in multiple ways.
… This [exhaust] is especially true of many of the cards generated early in the process, when I’m still scratching around, trying to get purchase on the proof. Unfortunately, also as mentioned above, I don’t yet have much clarity on which cards are exhaust, and which are crucial.
… my informal pop-psychology explanation is that when I’m doing mathematics really well, in the deeply internalized state I described earlier, I’m mostly using such higher-level chunks, and that’s why it no longer seems symbolic or verbal or even visual. I’m not entirely conscious of what’s going on – it’s more a sense of just playing around a lot with the various objects, trying things out, trying to find unexpected connections. But, presumably, what’s underlying the process is these chunked patterns.
Now, the only way I’ve reliably found to get to this point is to get obsessed with some mathematical problem. I will start out thinking symbolically about the problem as I become familiar with the relevant ideas, but eventually I internalize those ideas and their patterns of use, and can carry out a lot (not all) of operations inside my head.
Sometimes the ways of thinking seem too personal to be useful. Richard Feynman, in The Pleasure of Finding Things Out, explained how counting is a verbal process for him, and then ended with:
I often think about that, especially when I’m teaching some esoteric technique such as integrating Bessel functions. When I see equations, I see the letters in colors — I don’t know why. As I’m talking, I see vague pictures of Bessel functions from Jahnke and Emde’s book, with light-tan j’s, slightly violet-bluish n’s, and dark brown x’s flying around. And I wonder what the hell it must look like to the students.
Sam Derbyshire concurs:
The issue seems, to me, that a lot of these mental pictures are very personal. … Because of this, I think there might not always be a significant value in trying to pass those mental pictures over—the real aim is to provoke the student into developing his own mental pictures, that he can strongly relate to.
Some words such as “homological” or “homotopical” spark up very distinctive feelings in me, in a similar way as hearing “mountain” would make me visualise various mountains, hills, cliffs, etc. But whereas the meaning of “mountain” came to me through vision (mainly, but also other senses), the origin of my mental images of mathematical ideas comes through the practice of mathematics. As such, it seems harder to convey these mathematical pictures: they must be backed up by precise mathematical understanding, which at any rate should end up conjuring these mental pictures.
as does Mariano Suárez-Álvarez:
I think the root of the phenomenon is that we can only communicate to others what we know, not what we understand.
Also, it is not unreasonable to think that one’s mental images are not going to be of any help to others (In fact, they may well make things more complicated, or confusing for others: I have been told mental images by others—sometimes indirectly, by the choice of the word introduced in a definition—and been thereby misled; here «misled» means «led in a direction different to the one I personally would follow in order to form my own mental image of the concept».)
For example, for me resolving the singularities of algebraic varieties makes a clicking (or clacking) sound: this is quite significant for me in a way, but when talking to others I doubt I’d make any mention of this, for seriously doubt it would help :)
I think this is too pessimistic, and not necessarily reflective of collaborative problem-solving. Tao again:
I find there is a world of difference between explaining things to a colleague, and explaining things to a close collaborator. With the latter, one really can communicate at the intuitive level, because one already has a reasonable idea of what the other person’s mental model of the problem is. In some ways, I find that throwing out things to a collaborator is closer to the mathematical thought process than just thinking about maths on one’s own, if that makes any sense.
… I think one reason why one cannot communicate most of one’s internal mathematical thoughts is that one’s internal mathematical model is very much a function of one’s mathematical upbringing. For instance, my background is in harmonic analysis, and so I try to visualise as much as possible in terms of things like interactions between frequencies, or contests between different quantitative bounds. This is probably quite a different perspective from someone brought up from, say, an algebraic, geometric, or logical background. I can appreciate these other perspectives, but still tend to revert to the ones I am most personally comfortable with when I am thinking about these things on my own.
But Terry Tao is an extremely social collaborative mathematician; his option seems somewhat foreclosed to truly ground-up independent thinkers. The best they can do is to spend thousands (or tens of thousands) of hours trying to convey how they think. That’s what Thurston realised and did later in his career, or what Grothendieck essentially did his whole life, etc. In the best case scenario they revolutionize or obsolete entire fields; otherwise they’re just ignored as adjacent intellectual communities judge the expected reward not worth the effort needed to cross the too-large inferential gap.
Shinichi Mochizuki is an interesting middle of the road case here (purely anthropologically speaking, I have no hope of following the object-level). There’s been considerable activity at Kyoto University’s Research Institute for Mathematical Sciences (RIMS) around the ideas Mochizuki developed in the course of (purportedly) proving the abc conjecture, while to first approximation everywhere else his proof isn’t recognised as correct and nobody understands his ideas, made worse by Mochizuki savagely chastising the few exceptions in the wild who’ve tried to distill his ideas (e.g. Kirti Joshi, James D. Boyd etc) as incompetent cranks – I’m severely understating his responses, they are unsummarizably unique in the level and color of their vitriol. Mochizuki’s ideas are so original that world-leading mathematicians in adjacent fields can convene a week-long workshop to understand what his 4 papers are saying and still bounce off by day 3; cf. Brian Conrad back in 2015 when the rest of the mathematical community was still trying:
I attended the workshop, and among those attending were leading experts in arithmetic or anabelian geometry such as Alexander Beilinson, Gerd Faltings, Kiran Kedlaya, Minhyong Kim, Laurent Lafforgue, Florian Pop, Jakob Stix, Andrew Wiles, and Shou-Wu Zhang. …
It was not the purpose of the workshop to evaluate the correctness of the proof. The aim as I (and many other participants) understood it was to help participants from across many parts of arithmetic geometry to become more familiar with some key ideas involved in the overall work so as to (among other things) reduce the sense of discouragement many have experienced when trying to dig into the material. …
The workshop did not provide the “aha!” moment that many were hoping would take place. I am glad that I attended the Oxford workshop, despite serious frustrations which arose towards the end. …
There was substantial audience frustration in the final 2 days. Here is an example.
We kept being told many variations of “consider two objects that are isomorphic,” or even something as vacuous-sounding as “consider two copies of the category D, but label them differently.” Despite repeated requests with mounting degrees of exasperation, we were never told a compelling example of an interesting situation of such things with evident relevance to the goal.
We were often reminded that absolute Galois groups of p-adic fields admit automorphisms not arising from field theory, but we were never told in a clear manner why the existence of such exotic automorphisms is relevant to the task of proving Szpiro’s Conjecture; perhaps the reason is a simple one, but it was never clearly explained despite multiple requests. (Sometimes we were told it would become clearer later, but that never happened either.)
This got surreal, in a funny way:
After a certain amount of this, we were told (much to general surprise) variations of “you have been given examples.” (Really? Interesting ones? Where?) It felt like taking a course in linear algebra in which one is repeatedly told “Consider a pair of isomorphic vector spaces” but is never given an interesting example (of which there are many) despite repeated requests and eventually one is told “you have been given examples.”
Persistent questions from the audience didn’t help to remove the cloud of fog that overcame many lectures in the final two days. The audience kept asking for examples (in some instructive sense, even if entirely about mathematical structures), but nothing satisfactory to much of the audience along such lines was provided.
For instance, we were shown (at high speed) the definition of a rather elaborate notion called a “Hodge theater,” but were never told in clear succinct terms why such an elaborate structure is entirely needed. (Perhaps this was said at some point, but nobody I spoke with during the breaks caught it.) Much as it turns out that the very general theory of Frobenioids is ultimately unnecessary for the purpose of proving Szpiro’s Conjecture, it was natural to wonder if the same might be true of the huge amount of data involved in the general definition of Hodge theaters; being told in clearer terms what the point is and what goes wrong if one drops part of the structure would have clarified many matters immensely.
The fact that the audience was interrupting with so many basic questions caused the lectures to fall behind schedule, which caused some talks to go even faster to try to catch up with the intended schedule, leading to a feedback loop of even more audience confusion, but it was the initial “too much information” problem that caused the many basic questions to arise in the first place.
- EJT 9 Nov 2025 13:27 UTC
  6 points
  1
  Parent
  Needless to say, none of these analogies show up in my published papers
  This is kind of wild. The analogies clearly helped Tao a lot, but his readers don’t get to see them! This has got me thinking about a broader kind of perverse incentive in academia: if you explain something really well, your idea seems obvious or your problem seems easy, and so your paper is more likely to get rejected by reviewers.
- testingthewaters 9 Nov 2025 19:43 UTC
  2 points
  0
  Parent
  To be honest, this makes me quite worried. Suppose that someone working with mathematical methods proves something of dire importance to society (lets say he comes up with a definitive formula for measuring probability of disaster in a given year, or the minimum conditions for AI takeoff). How will this be communicated to other mathematicians, much less the public?
  - Nisan 9 Nov 2025 20:43 UTC
    2 points
    0
    Parent
    All the mathematicians quoted above can successfully write proofs that convince experts that something is true and why something is true; the quotes are about the difficulty of conveying the way the mathematician found that truth. All those mathematicians can convey the that and and the why — except for Mochizuki and his circle.
    The matter of Mochizuki’s work is intriguing because the broader research community has neither accepted his proof nor refuted it. The way to bet now is that his proof is wrong:
    
    Professional mathematicians have not and will not publicly declare that “Mochizuki’s proof is X% likely to be correct”. Why? I’d guess one reason is that it’s their job to provide a definitive verdict that serves as the source of truth for probabilistic forecasts. If the experts gave subjective probabilities, it would confuse judgments of different kinds.
    - Mitchell_Porter 9 Nov 2025 22:12 UTC
      4 points
      2
      Parent
      Most people with an opinion regard Mochizuki as refuted by Scholze and Stix. They simplified his theory to do it and Mochizuki says they oversimplified, but no one has managed to understand how the details of the full theory would make any difference.
      If I was trying to resolve the issue, I might start by formalizing (in Lean) Kirti Joshi’s claimed proof of abc, which is inspired by Mochizuki but which uses more familiar mathematics.
    - testingthewaters 10 Nov 2025 0:11 UTC
      2 points
      0
      Parent
      Yeah the next level of the question is something like “we can prove something to a small circle of experts, now how do we communicate the reasoning and the implications to policymakers/interested parties/the public in general”