XelaP

Karma: 803

Came for the Harry Potter fanfic, a positive vision of what do with atheism, and the anti-death ethos. Stayed for the good thinking.

I really like math and physics. I can program but am not a programmer—I’d have to learn how to make a website and struggle to handle more than a couple hundred LOC. Learning a little Haskell made me enjoy coding again, I’d like to get better at it.

I’m currently very bad at following through on commitments. I hope to fix this by tomorrow. I’ll probably fail, but I shoot for getting it down before my fifty thousandth tomorrow.

XelaP 30 Apr 2026 10:15 UTC
1 point
0
on: A research agenda for the final year
I think that if you’re trying to give your AI ethics at the outset or do something like writing down a CEV utility function, something’s already gone deeply wrong^[1], in the same way that capabilities-wise you shouldn’t be hardcoding quantum mechanics into the AI. It’s supposed to be superintelligent—it should learn what you/humans care about. The thing you need to figure out is how to get the AI to care about the thing that humans care about, which it then tries to learn and then optimize for.
Under this model, most of the difficulty would also be present in trying to get an AI that cares about the thing that some agent cares about, e.g. some aliens, or possibly chimpanzees and other animals, to the extent they have goals.
(in fact I think that somewhere from half to most of the problem is getting an AI that cares about any not super easy goal about the real world at all, like the classic “maximize the number of diamond atoms”. If you knew how to actually build a literal paperclip maximizer, I would expect that you’ve figured out much of alignment)
1. ^
  This includes it being the last resort alternative to other actors doing even dumber stuff. I count corrigibility as an example here—hopefully it’s easier, albeit worse.

XelaP 30 Apr 2026 9:37 UTC
1 point
0
in reply to: Alex_Altair’s comment on: Alex_Altair’s Shortform
To me, the morally correct proof of commutativity of addition is the Eckmann-Hilton argument, aka you pick the right side up, move it to the left, and put it back down. Exercise to the reader as to how to actually do this on the integers without defining addition the usual way.
At least, when I tried to adapt it it seemed like it worked—I’ve not actually seen someone else do it, despite how obvious it is to try applying it to integer addition.
This also lets you prove associativity from the interchange law for whatever two additions you came up with in the exercise.
With the negative and rational numbers, personally I would say “well, you should be able to split 1 into two parts that add to it; now, what would that require?” and likewise for the negative numbers. For exponentiation by a rational, it’s “I should be able to split x into two parts that multiply to it!”
The ring theoretic acknowledgement of the fact that multiplication is morally repeated addition is that you have a ring homomorphism from the integers to any ring, though the problem is that if 1+1...+1 = 0 then it’s not injective; but the integers mod n will be for some n. Likewise, the rationals embed in any field where 1+1...+1 ≠ 0. Though this fact does get mentioned in books and classes, it should perhaps be moved up as one of the first exercises.
What I want to know, though, is what your problem with the algebraics are! What else are you to do besides roots of polynomials? Though, I will say that I wish I had realized much earlier that rational functions really are ‘everything algebra can do’. For roots, a root of a rational function must be a root of the numerator, so you can just look at polynomial roots.
There are alternate constructions of the reals. I think the Eudoxus reals are philosophically interesting—they are based on the integers, and use a intuitive idea of counting how multiples line up. The nonstandard analysis stuff is supposedly more intuitive, but I don’t know if their version of the reals is too.
The reason why classes and books don’t talk about hyperoperators is that, aside from the Ackermann function, I’ve never seen any beyond exponents actually come up. Unless you meant that elementary school teachers weren’t telling you that multiplication was repeated addition, in which case I’m mildly horrified.
In case you want an example of unnecessary assumptions, I’ve found that most don’t know that you don’t have to assume that ring or module/vector addition is commutative—it’s derivable from the other axioms.

XelaP 29 Apr 2026 23:28 UTC
1 point
0
on: Theoretical Alignment’s Second Chance
Even if you had a proof generating oracle, the problem is that most of the hard work is much more intangible, stuff like “what even are the right definitions?” and “what theorems should we be trying to prove?”. A proof oracle would sure be helpful, but most of the hard part is much harder to check.

As an example, basically every famous math problem (e.g. the Riemann hypothesis) is to my understanding important mainly because of the expectation that solving it would require significant theoretical developments. If instead you got a proof that you cannot readily mine conceptual insights out of, or if you don’t build a theory in the process, or if the AI builds the theory but no human understands it (say, because they can’t tell what’s junk and what’s legit without significant time and effort), then they are much less helpful. It also scales less with adding more chain of thought time or doing it in parallel a bunch.

XelaP 28 Apr 2026 19:24 UTC
−3 points
0
in reply to: James Camacho’s comment on: Maybe Insensitive Functions are a Natural Ontology Generator?
At the very least, the stochastic redund condition feels like a pretty minimal version of what ‘insensitivity’ could mean. The parity is still pretty maximally insensitive—if you’re trying to reduce your uncertainty about what the parity is, learning about (n-1) bits doesn’t even help you until you learn the last one! I doubt a good definition of “insensitivity” would call the parity insensitive.

XelaP 28 Apr 2026 18:27 UTC
1 point
0
in reply to: James Camacho’s comment on: Fractals to Quasiparticles
Well, you really have F = lambda f,n: n * f(n-1) if n > 0 else 1, and then (YF)(n) beta-reduces to a church numeral. Our fixed point is the factorial function.

Trying to duplicate this with the sets in question: how should sets be encoded? I suspect that for any reasonable notion, you will have a problem where the fixed point might be some terrible thing.
As an example, suppose you try the most obvious way: a set is represented by a function that returns true for all elements in the set, and false otherwise. But how are we to plug in real numbers? Well, we could take computable approximations to the reals… but then we can’t even define the set {0} as a computable function! Even if we don’t allow something as pathological as the reals, decidable sets are much lesser than all the sets.
Any other way would, I think, have to basically be manipulating formal logic strings represented via e.g. a godel numbering and church numerals. Then, one defines the iteration as a function that takes in a string defining a set in first order logic, then outputs a first order logic string defining of the next iteration. But then, there’s no guarantee that the fixed point is a number, let alone a number corresponding to a logical formula that defines a set.
About the representations: in the finite dimensional case, there is a bijection between projective irreps of G and determinant 1 ordinary irreps of the universal cover of G. Apparently in the infinite dimensional case, you can only lift projective irreps to ordinary irreps in certain cases which include the Poincare group. I can’t tell if you would get a bijection in those cases.

XelaP 28 Apr 2026 18:11 UTC
2 points
0
in reply to: James Camacho’s comment on: Maybe Insensitive Functions are a Natural Ontology Generator?
Ah, I was talking about the conditions for natural latents, the main research program of the post author. See this post for a good math intro containing those definitions.

XelaP 28 Apr 2026 16:55 UTC
1 point
0
in reply to: James Camacho’s comment on: Maybe Insensitive Functions are a Natural Ontology Generator?
The mediation condition is that when you condition on the latent, the mutual information between any one variable and the joint distribution of all other variables is low. In the case of the energy and temperature, once you know the energy and temperature, all the variables are now independent, and so you get no mutual information. However, with the parity, the rest of the variables let you figure out the last, so we fail mediation.
For redundancy, the energy and temperature is for the most part determined by any (n-1) variable subset, becaues averages. This isn’t true of the parity—the last bit being ⁵⁰⁄₅₀ means you still have total uncertainty over the parity.

XelaP 28 Apr 2026 16:48 UTC
1 point
0
in reply to: James Camacho’s comment on: Fractals to Quasiparticles
I don’t understand why you are talking about interpreters. Do you agree that the Y combinator, when applied to a lambda function, might not be a lambda function corresponding to a number? Or, say, a lambda function that encodes the set in question? I still don’t see how you are encoding the set in question—are you literally manipulating first order logic strings encoded as numbers? If so, then once again you might get a fixed point that corresponds to no number—and even if you get a number, the encoded string might not correspond to a set.
To clarify 2 (though I think we don’t disagree here) - the reason you stop at the universal cover is simply because that’s where you get a bijection.

XelaP 28 Apr 2026 15:36 UTC
1 point
0
in reply to: James Camacho’s comment on: Maybe Insensitive Functions are a Natural Ontology Generator?
Parity is not a latent, because the information is not redundantly expressed/not insensitive.
Furthermore, after conditioning on the parity there will now be some mutual information between one of the bits and all the others, in that it’s the max value of 1 bit (since knowing all the others and the parity allows you to figure out the last). Thus we have the worst KL-error for mediation.
In both of these, I’m talking about the distribution of the bitstring at a certain time.
I’m not sure what your limit is supposed to mean—Y here is not always taking the bits at the same timestep, right? But then, why divide by n?

XelaP 28 Apr 2026 15:13 UTC
1 point
0
in reply to: James Camacho’s comment on: Fractals to Quasiparticles
The idea behind the universal cover is that, if you had a loop of simplices and hopped around it in a circle, when you get back to where you started you should have the same “value”. The representations tell you how this value changes as you hop around (translate/rotate). If you can continuously deform every loop into a point, or a point into any loop, then any representation will satisfy this “same value” property. So you lift to a universal cover and then ask for the representations.
This can’t be why it shows up. Let’s take the group action of the circle on R^2 that just rotates. The circle is not simply connected, yet when I go in a loop my representation will continuously go back to where I started, as must be the case simply because I have a loop and a continuous group action.
In general, lie group representations should always lift to the universal cover I think; the whole point here is I think that there’s a correspondence the other way, and that’s only there because we’re looking at a projective representation.
Lastly I think using the Y combinator is probably a bad idea, because you could get fixed points that don’t correspond to a set in whatever your interpretation of lambda functions as sets is. And also, how are you interpreting, exactly??
I believe I needed the axiom of choice for Knaster-Tarski and possibly somewhere in the contraction mapping theorem, but not sure.

XelaP 28 Apr 2026 11:11 UTC
3 points
0
in reply to: Wei Dai’s comment on: LessWrong Shows You Social Signals Before the Comment
For me, it also motivates developing any small hints of “that seems wrong”, or just writing down larger internal wrongness alerts. Especially if it’s something that’d take a fair bit of effort, I’m more likely to put it in if it’s to a post with higher karma and/or agree votes, because then it means more people are being (to me) misled.

XelaP 28 Apr 2026 9:41 UTC
1 point
0
in reply to: Optimization Process’s comment on: [bounty $100] Why are there no interesting (1D, 2-state) quantum cellular automata?
It looks to me like James Camancho is wrong, as I pointed out in my comment there. Regardless if he is correct or not, it is in my opinion a terrible explanation if you only know basic quantum mechanics, and also not good even if you have a fair bit more background.
Here is my relatively short explanation of the argument:
James Camancho is arguing that because the symmetries of relativity in 1D must act on quantum states in a simple way. Namely, you should be able to split into subspaces, where in each subspace you just multiply by for angle (giving the angle in the symmetry group) and constant depending only on the subspace. He then says that must be rational to have a finite number of rules.
I disagree—nobody said the cellular automaton’s rules had to be symmetry transformations of the space. Additionally, nobody said the space is 1D—the dimension of the cellular automata you are talking about is just that you have a 1D “tape” of states, not that the states themselves are wavefunctions in 1D. For example, if you were imagining qubits implemented as spin states, then that’s definitely not 1D wavefunctions.
Even if I’m missing something and he’s right anyways, you should note that this would only be about what you can implement in our universe. A different physics could have different symmetries, and thus different representations. Some variant symmetries and representations are even proposed extensions to the current settled science!
More details:
The physical laws of the universe are invariant under certain symmetries, and studying the implications of this fact is quite important to modern physics. If it’s a symmetry then it should be invertible; and the observation probabilities |<x|y>|^2 for normalized states must be the same. In other words if we take two states, normalize, and compute |<x|y>| we should get the same thing before and after transforming. But then by Wigner’s theorem there is a unique (up to global phase scale) unitary or antiunitary transformation (as in, <Ux|Uy> = <x|y> - or for antiunitary it’s the conjugate) that gives the same transformation.^[1] It’s an easy exercise to show that a surjective unitary transformation between inner product spaces must be linear (otherwise I think it must be antilinear). Regardless, we have what’s called a projective unitary representation, meaning our symmetries act on projective space by unitary operators.

A representation is a group acting on a vector space via linear transformations.

Alright, now we can look at the projective representations of the symmetries that relativity implies (called the Poincare group, which has translations (and time translations) and rotations along with Lorentz boosts). Details in footnote^[2].
The projective representations of the 1D version of the Poincare group’s universal cover are apparently of the simple form mentioned earlier.
1. ^
  This is usually written in ‘projective space’, which means you consider two vectors the same when they differ only by a nonzero scale factor—because wavefunctions that differ by a global scale factor aren’t physically different.
2. ^
  We have a problem—projective representations are annoying. Well, if we basically look at the derivative we get a linear algebra thing, which can be easily deprojectivized. Then, we can basically integrate. However, the derivatives only capture the local structure, and will thus “not see” any global problems—for example, the circle and the real line both have the same tangent space at the origin, but the circle ‘curves’ in. If we just integrate, we’ll get the real line. In general we’ll get the version of the circle without any ‘loops’ that can’t be shrunk down continuously to a point. That is, we want the universal cover, which is the simply connected (meaning all loops can be shrunk) space that locally looks like a bunch of copies of the symmetries. For example, the universal cover of the circle is a ‘helix’ that projects down to the circle.

XelaP 28 Apr 2026 8:57 UTC
1 point
0
in reply to: XelaP’s comment on: XelaP’s Shortform
April 24-28
I spent most of the 25th and some of the 26th thinking time thinking about natural latents and symmetries. On the 26th I mostly got nothing done, as I was tired and grumpy—it was actually quite unusual, I’m not sure why I was in a funk for at least half the day. Most likely candidate was that I had been not eating well the previous days, on account of underestimating the appetite suppression effect of current medication that I’ve been now taking regularly.
I don’t have good memories of what I did on the 24th. Looking at my comment history, it looks like that’s where I got started on the symmetries thing, which is cool. Looking at my browser history, it looks like I was fishing for examples of basic representation theory, so I was staring at the discrete fourier transform and the more representation theory fourier stuff (like on a finite group). The takeaway for me was mainly that the Hadamard transform that I encountered years ago is in fact what you get when you use (Z/2Z)^n. You (well, at least, I did so, it’s not explicitly listed on the Wiki page but I think I’m right) can interpret this as the subspaces of functions on the vertices of the hypercube, corresponding to the way it is affected by bit flips. I also learned that the Walsh functions (the analogue of sine and cosine in this setting) can be interpreted as taking in a truncated bitstring of a number in the interval [0,1], and then they’ll actually be a complete basis (as long as you allow limits, that is, a schauder basis).
The rest of this, then, is from today (the 27th). Another thing I did today was put my recent and old code on github. If you want to look at monstrosities written by my teenage self, you can now do so. Yes, I used to use three spaces to indent. Yes, I put a hyphen in a directory name. Yes, I put spaces in filenames. Oh, and good luck with the sparse comments.
0: I was thinking about how you could get fractals as unique fixed points. The nicest way is with an iterated function system of contraction mappings that gives the similarities. The iteration map as a function from subsets to subsets will be a contraction mapping in the Hausdorff distance (exercise for the reader—it’s not that hard!), and the Hausdorff distance is complete on the nonempty compact subsets. So you can get unique fixed points on nonempty compact subsets, and iteration converges to it under the Hausdorff distance. e.g. you don’t need to use a triangle to get the Sierpinski triangle.

Also, the way I think about Hausdorff distance now: the distance is ⇐ c when c upper bounds the values of d(x,Y) and d(X,y). Alternatively, X is in the c ball (union of points ⇐ c away from a point in the set) of Y and Y is in the c ball of X.
1: Suppose we have a possibly nonlinear map T: V → W where V is an inner product space, such that <Tx, Ty> = <x, y> for all x,y. That is, it’s ‘unitary but nonlinear’. If T is surjective, then we can actually show that it’s linear: since <T(x+z), Ty> = <x+z, y> = <x,y> + <z,y> = <Tx, Ty> + <Tz, Ty> = <Tx + Tz, Ty> so <T(x+z) - (Tx + Tz), Ty> = 0. If T is surjective then we can make Ty equal the term in other the argument—but then the positive definiteness of the inner product means that T is linear.
If V was merely a normed space, then the Mazur-Ulam theorem says that a surjective isometry must be affine over the reals—or in other words if 0 goes to 0 then it must be linear as a map over the reals. However, if we have complex vector spaces then we might not be linear.

1: The representations used in quantum physics are really projective representations, since quantum states are invariant under scaling. One can typically lift these to representations of the universal cover for lie groups, because you can look at the lie algebra, then deprojectivize, then exponentiate at the cost of losing any global “twist” information.
2: Apparently the Wigner classification of the positive energy finite mass unitary irreps of the Poincare group includes the possibility of massless particles with ‘continuous spin’?? And nobody’s observed fundamental particles like them and only recently have we suggested possible theories for them? What the hell??

XelaP 28 Apr 2026 6:05 UTC
1 point
0
on: Fractals to Quasiparticles
The issue with recursive definitions is they create a vicious circle. To get around this, you need to specify a base case, and constructively build out your definition from there.
...no. Frequently, recursive definitions don’t require a base case—you can find a (or the) object that makes the definition work, usually via a fixed point theorem.
In the Sierpinski case, we want to find a fixed point of the function that takes a subset of the triangle and maps it to the union of the 3 shrinked versions positioned in the appropriate way. We don’t want the empty set, nor the thing we’d get if we iterated a single point—we want the largest fixed point. First, note that the iteration function is monotonic under the subset relation. Next, we can apply the Knaster-Tarski fixed point theorem to prove that for some ordinal w, the wth iteration of the triangle is the greatest fixed point (where a limit ordinal means we take the infimum of all previous). Now in this case, if w is the integers, we see that the intersection of all the finite iterates in fact a fixed point. Thus we get Sierpinksi’s triangle.
In fact, we can instead just take three points A,B,C that are corners of a triangle and use the dilations d_A, d_B, d_C where e.g. d_A moves a point p halfway towards A. If we require the input to be a nonempty compact (aka closed bounded) set, and use the Hausdorff metric’s limits for iteration, we will get the Sierpinski triangle. That is, each individual dilation is a contraction map, and the iteration that unions each dilation (as a function from subsets to subsets) is easily shown to be a contraction map—and since the Huasdorff metric is complete on compact sets, the contraction mapping theorem gives us a unique fixed set that we can get via taking the limit of iteration.
All we need to do is find the group, the “base case” of reality, our elementary particles represent
The group isn’t the base case, it’s the iteration.
I was wondering why we go from the group of symmetries to the universal cover. I don’t think the story about loops you gave is why, unless there’s an equivalent way to make that interpretation cash out physically. The explanation I am finding (though there’s others for other more general cases that involve math I don’t know at all like group cohomology): What we really want is projective representations (just because we are doing quantum physics, and the wavefunction is only defined up to a scalar factor). Now, it turns out you can lift these. It looks like you need to pass to the lie algebra representations, then you can deprojectivize by choosing traceless representatives. Then you can lift to the lie group, except now it’s the universal cover. So you get determinant one reps. You can do the converse too. In nice cases you have that all reps of the universal cover are det one so you don’t have to worry about that; otherwise you need to look only at irreps.
I don’t see why rational alpha is needed to have a finite number of rules for a cellular automata. After all, who said your rules perform one of the group representation operations! Just take, like, any quantum gate on qubits. Also, like, who said the states are wavefunctions in 1D? I should be able to build a quantum turing machine operating on a linear tape of hydrogen atom states (or more realistically spin states) to give me a perfectly cromulent cellular automaton whose rules aren’t a representation of the poincare group, let alone the 1D one.

XelaP 28 Apr 2026 1:45 UTC
1 point
0
in reply to: James Camacho’s comment on: Maybe Insensitive Functions are a Natural Ontology Generator?
No, the reason why we should have insensitivity is not quite that. As an example suppose I have a long random bitstring of fair coinflips X_1,X_2, etc. Now say that the “laws of physics” randomly turns over some number of coins—but for some reason can only turn over an even number. This gives us variables Y_1, Y_2, … at the second timestep. That is, P(Xs, Ys) ∝ 2^{-sum(xor(Xs,Ys))} if xorsum(xor(Xs,Ys)) == 0 else 0
where xor is bitwise xor, sum is sum of strings bits (as an integer, e.g. sum(110)=2), xorsum is sum of string’s bits mod 2, and it’s only proportional because we threw away the ones that changed an odd number of bits.
There’s a symmetry for the distribution of Xs: we can NOT any bit without affecting the distribution. Therefore it must be the uniform distribution—which we already knew, of course.
Likewise, the marginal distribution P(Ys) is symmetric under flipping any bit, since P(Xs, singleflip(Ys)) = P(singleflip(Xs), Ys) and so after summing over X we’ll just get a reindexing of our sum. So the Ys are also given by a uniform distribution. We could repeat our ‘laws of physics’, getting a sequence of bitstrings. The marginal distribution over the bitstrings at any time is still uniform.
The xorsum of the bitstring at any time is conserved over time. Yet it’s certainly not ‘insensitive’! If you look at some time, you cannot just use it as a latent for the distribution at that time, because you fail both mediation and redundancy.
I still think it’s likely that you can connect symmetries to redundantly encoded information, but it can’t be done this way.

XelaP 28 Apr 2026 0:59 UTC
1 point
0
in reply to: James Camacho’s comment on: Maybe Insensitive Functions are a Natural Ontology Generator?
Perhaps you mean something else—but I am pretty sure you’re first statement
for all g in G: I(Y;X) = I(Y;gX)
⇒ I(Y;X) = I(Y;X/G)
is obviously false if P(Y,X/G) = \sum_g P(Y, gX) (that is, the probability of being in the orbit).
Here’s a counterexample: Say that X is a bit taken from a fair coinflip and Y is NOT(X). I(Y;X) = 1 and if we take the group action that just applies NOT again we see that I(Y;gX) = 1 as well. But X/G is just a single element and so I(Y;X) has to be 0 (you don’t learn anything when something with probability one happens!)
That is, just because X tells you just as much about Y as NOT(X) doesn’t mean that you could’ve learned just as much from knowing that you got either X or NOT(X), because duh.
And what is H(G) supposed to be? We aren’t randomly drawing from G.

XelaP 28 Apr 2026 0:47 UTC
1 point
0
in reply to: XelaP’s comment on: XelaP’s Shortform
Adding to reading list: clearly I need to read Jaynes’s writing on determining priors from symmetries.
I initially dismissed this, because I’m not looking for a prior here. But consider the statistical mechanics case: an ideal gas’s distribution is given by the max entropy distribution subject to the constraint that average energy is some fixed value. This is basically the sort of thing Jaynes looks at. What I want to do is essentially go backwards—we start with a probability distribution, and need to figure out the energy, and it shouldn’t require having literally max entropy.
The stuff he did, to my understanding, is to show that if we require that the distribution is invariant under some symmetry group, then we can deduce the probability distribution in many cases. This seems to include doing things like deducing a distribution over a parameter, which should take the role of a latent?
I still find it fairly likely that it ends up not doing what’s needed, but is definitely something to look at.
Currently, I realize there’s a simple question that surely has standard answers: what’s the symmetries of the Boltzmann distribution? From there, maybe we can get that if know that P(X|L) has some symmetries, then we can find P(X|L)… and hopefully maybe also P(L). It’s still too circular, though. But regardless—first, find the symmetries in that case.

XelaP 27 Apr 2026 16:10 UTC
3 points
0
in reply to: Dentosal’s comment on: Dentosal’s Shortform
Interesting how you reinvented Willpower Hax #487 but with a different theory behind it—the theory there is that it takes willpower to deviate from the default trajectory, so if mentally the default action is executing at the count of 5, you’ll end up doing it.

XelaP 27 Apr 2026 15:49 UTC
1 point
0
in reply to: sanyer’s comment on: Forecasting is Way Overrated, and We Should Stop Funding It
And they tended to be quite successful, yet not get used or get shutdown, which Hanson attributes to cognitive biases and upsetting social hierarchies. But it’s possible for such things to get adopted—e.g. hospitals do hand washing now, and even checklists have spread. Thus the idea is to get some more high profile showcase of its usefulness, and for decisions that are worth much more.

XelaP 27 Apr 2026 6:00 UTC
3 points
0
on: Units Have More Depth Than I Thought
I seem to have unusually low fondness for dimensional analysis, and feel it is overhyped. Some bits are useful, like having units at all, or a couple type signature like ideas—except the ones I find the most useful are ‘officially’ dimensionless! To me, ‘radians’ are a unit (which isn’t the same as length!), log(x) has the unit of ‘log-[X units]‘, etc. Z-scores are sorta unitless, but only in the way that is “X metres”, X is unitless. The version for Z-scores is, of course, “SDs of [X units]”, which is in part its own unit (but with implicit casts to [X units]). As an example, if someone is +x SDs in variable X, and variable Y correlates with correlation r with X, then the mean estimate for their Y value is r*x SDs in Y. Diopters and wavevectors both are officially in inverse length units, yet to me wavevectors actually have the units of radian(phase)/meter—and indeed I think of optical power separately from spatial frequency. I don’t usually use the unit of “moles”, but rather the unit of “moles of X”. While I am sure my way of thinking is not original, books and teachers wouldn’t use these sorts of ‘units’ except occasionally at the college level, by professors who informally reason in a similar way—a way that does not get into the course material! For some reason people think that “dimensions” must refer to “physical quantities” as if radians and decibels are not a true physical quantity.
For mathematics, basically the only useful units are those ‘officially dimensionless ones’. Thinking in terms of types, however, is extremely useful—I wouldn’t understand differential geometry otherwise. As an example, a covector field on manifold M basically has type M → (((M → R) → R) → R). Being passingly familiar with basic category theory lets you draw diagrams of various concepts, where often the main value add of the diagram is just “this function has this type, and is a composition of these other functions, and equal to this other composition of functions”.
If someone merely says that it’s important to track what each variable in a physical equation represents, then I will of course agree with them. But some of the claims dimensional analysis advocates make in e.g. physics just seem like bullshit. “Ah, you see, you can deduce that this equation is wrong because of the unit mismatch—and see, over here there should be a proportionality factor dependent of material properties!” seems to be a terrible way to reason because of how “just add a factor whose value is 1 and that has units that make it work” literally happens sometimes.
To me it is also a bad way to think of proportionality factors—in reality whatever reasoning you used to get the equation should be obviously something that is merely proportional, or alternatively you are adding a ‘fudge factor’ meant to account of deviations from your model (which you are implicitly assuming to not vary much), or you are literally taking a taylor series whose coefficients you plan to fit to data—and how could you forget the coefficients there, when you clearly don’t know the derivatives? Likewise, I do not think “figure out how it should scale, as a fermi estimate” falls under the same category—that part is great, but has almost nothing to do with units or dimensions, and everything to do with being good at making simple approximate models.
The Buckingham-pi theorem is clearly cool. Yet, I have not seen someone actually use it, instead of retroactively using it. And even just taking the exampre from the Wikipedia page of the pendulum: I find it worrying that the analysis doesn’t alert you that it should fail for large angles. As another famous example, I don’t think it provides a good explanation of the Reynolds number: it’s much better to think of comparing inertial effects to viscous ones. In practice to come up with dimensionless characterizing quantities I find it easier to just divide each term in an equation by a ‘scale value’, basically taking the scale as my units. See for example the van der Waals equation, which is usefully written with variables taken relative to the critical point.
Importantly, the reason why those work, and are ways that people actually reason, is because they are about physics. They require thinking about the physical system, and give you analyses of it—and thus you can notice things like “ah, the angle has to be small”, can understand which dimensionless units have important physical interpretations, and not worrying about typecasts that shouldn’t be worried about.

XelaP

April 24-28