I am quite surprised that this happened 3 years ago! This seems really impressive for 3 years ago GPT series? And I expect the models to get better? Yes, it might be a fluke, but wouldn’t we expect current models to have a higher chance of doing a fluke this good?
XelaP
Though others have made the point essentially, I feel like (a) simple answer bears a simple explanation: Just imagine your hypothesis-turing-machines output (approximate) probabilistic predictions. For example, imagine you output probabilities that are fractions, over a finite portion of the input space, so that you don’t have to worry about that messy infinite continuous stuff.
Note—not sure if the exact form has the same nice properties. However the approximate form should be workable I think.
But the clockwise rule only tells you anything when more than two people are there at the same time. Because A < B < A in clockwise order.
Possible example: Laennec’s invention of the stethoscope in 1816. Of course we would’ve come up with it eventually. But note that Laennec got his inspiration from kids playing with sticks and from his prudishness about putting his ear to a woman’s chest.
Consider that people have been using sound in diagnosis for millenia. But even something as simple as tapping a finger on another (to e.g. feel and hear the liquid in e.g. your lungs (which you don’t want)) was introduced in the mid 1700s by Auenbrugger (though some medieval guy had it too? Not going to count it since it seemed to not be advanced further) and the method also influenced Laennec. Auenbrugger was inspired by his father’s wine business—you tap the barrel to see how much fluid is in it!
So, consider: anyone ‘could have’ come up with either of these for… literal millenia? But they didn’t? And the main inspiration was stuff most medical practicioners weren’t looking at? Note that Laennec had some experience in flute making that helped him make his stethoscopes.
Lastly (Corvisart)[https://en.wikipedia.org/wiki/Jean-Nicolas_Corvisart] appears to have helped keep the percussion technique of Auenbrugger alive. Laennec learned of percussion from Corvisart’s translation of Auenbrugger—and Corvisart expanded on his findings of how to use the sound info. This isn’t a fundamental discovery, but it looks like he did have significant impact.
Semmelweis, Lister, and Pasteur are great examples. Early adopters of Germ Theory and related issues like sanitation and antiseptics, disbelieved by everyone around them. But you can’t say that they didn’t have impact due to the disbelief—Pasteur was definitely influenced by Lister and Semmelweis, and Pasteur really got the purposefully made vaccines down (whereas with smallpox we lucked out with cowpox happening to already exist). So unlike others whose ideas are sufficiently strange as to be rejected (thus giving good evidence of counterfactual discovery, e.g. Mendel whose work was only rediscovered about when the actual content was being refigured out.), they managed to create huge counterfactual impact.
So I guess, if you can’t convince most people, at least manage to convince a handful of early adopters well positioned to reap the rewards of your ideas?
Additionally, the neural nets (afaik) are used in evaluation of a position, but not for stateful “strategy”. That is, the overall algorithm has a heuristic evaluation function (potentially incorporating a neural network), and then chooses a move by doing the sort of “future calculation” that humans do, except in fancy ways to make it compute fast.
An example, to point out that this isn’t necessarily a market failure caused by imperfect information/biases: fiction. Something new has a lower bar that something old. You can’t surprise me with the same plot twists, can’t give the same novel speculation (especially for the most important parts of the work, which I forget less).
Likewise if I have a way of detecting errors in e.g. code, I may want a completely-different-paradigm tester even if it’s on average worse, in hopes of catching the places where my first tester failed—likewise for emergency preparedness and backup techniques generally, where you want to minimize positive correlation in error so that something is very likely to work at all.
Sub-likewise, generally if you are willing to take a hit to the mean in favor of increasing variance (because you care about the positive heavy tails more than the negative ones, e.g. if you can take the max of your attempts, or if you need a hail mary in football to win) you will have an example of wanting worse but different.
Can a computer do this? That is, take in the footage and output a drawing or a 3D model that accurate? I don’t know what SOTA of that sort of image processing is (and of course nowadays we have better ML models).
Noether’s theorem is an interesting one. The evidence was there, but it’s the sort discovery that’s incredibly nonobvious even if you have a pile of evidence staring right at you. Perhaps Einstein would’ve gotten it. That she figured it out while working with Hilbert and Einstein on relativity suggests that the ideas that lead to relativity help you think of the ideas of Noether’s Theorem. But I think it’s pretty likely she was quite counterfactual here.
I think they’re talking about a formulation with the same essential point having come up earlier? I’m personally not familiar with Schwinger’s formulation so cannot intelligently comment much. I’ll also note that the true significance of path integrals took a while to realize (at least going by a comment in Shankar’s Princples of Quantum Mechanics, a standard QM textbook, where the preface to the 2nd edition says something like “In the first edition I put a chapter on path integrals because I thought they were important even though most people don’t include them. Boy, they became really important. I’ve added 100 extra pages on path integrals”)
However, I’ll note that Feynmann diagrams are another example of a conceptual advancement that was huge. Though, it seems like the mathematical development of the perturbation series and the fundamental concept was already around. Furthermore Stueckelberg came up with something similar, but didn’t provide as good a way of mechanically translating perturbation expansion terms into diagrams, and didn’t have the path integral (this is additional evidence for counterfactualness of the path integral, if you can apparently get halfway to Feynmann diagrams without coming up with path integrals). Likewise the diagrams took a while to become standard.
Thus it seems likely that Feynmann was pretty counterfactual here. Plausibly others that may have come up with the notation may have dismissed it like the people that dismissed Feynmann.
Feynmann was also famously good at this sort of conceptual insight, and so I am willing to believe that his unique abilities were actually important here.
CMB seems not counterfactual. The discovers did have to notice it and remain confused about how it was unexplained by problems with their equipment, and then be receptive to being told about a paper about how there might be radiation from the early universe. But the discovers were just looking at a sensitive radio detector meant to detect radio waves reflecting off hot air balloons. Anyone that developed sensitive equipment and then try to see faint signals would’ve noticed the mysterious noise.
Given the sheer importance of radio technology, I think there’d be many instances of people developing a similarly sensitive device and noticing the noise. It surprised me to learn that already at the time there was a paper about the possibility of radiation from the early universe, which plausibly sped up discovery. Note also that some astrophysicists nearby were (independently of the first discoverers, not independently of the paper as some of the people wrote the paper) about to look for a signal in the right region with the explicit intent of looking for background radiation.
So, if anything here is counterfactual, it would be Dicke and Peebles predicting the CMB. But I still don’t buy it, because even if nobody predicted it, people would’ve seen it not that long in the future. In fact before the main discovery in 1964, McKellar in 1941 observed a background appearing like a blackbody with the right temperature while observing the spectra of a star. He even guessed it had some significance.
I agree, but, he seems to have rather low counterfactual impact. His discovery was definitely very counterfactual, but it seems like his work was only recognized around the time it would’ve been rediscovered.
Langmuir’s adsorption isotherm is a little bit of statistical mechanics that, given my understanding of what you know already, I think you’d find really easy to understand. Undergrad classes derive it nowadays.
If it’s counterfactual, it would have to be due to spurning some development of statistical mechanics, because after some of the basics were developed someone would’ve derived it. I think it was actually a homework problem! All you have to do is consider a two state system (gas molecule attached to substrate/not attached), then use the grand partition function (the chemical potential, case of the partition function), then substitute a term for the value it has for an ideal gas. You’ll then get something that tells you the fraction of the substrate that will have an attached gas molecule. A neat application is hemoglobin and myoglobin attaching oxygen gas.
For a reference, see Chapter 5 Page 140-143 of Kittel’s “Thermal Physics”, a standard book on undergrad level statistical mechanics.
Onnes discovery seems clearly not counterfactual. My understanding was that multiple people were quite interested in the question of what happens to the resistance when you cool something down using the new tech of Dewars (invented by Dewar) and liquefied helium. For example, Dewar himself was looking into it! Onnes was motivated by an ongoing research agenda with multiple researchers trying to do the thing he was trying. Note also that it was a very short time between when the tech to cool down enough was invented to when Onnes made his discovery.
Onnes’s was the first to liquefy helium, but he bought the device he used (which had the novel innovation of exploiting the Joule Thomson effect to liquefy gases) from the inventors of the device (Linde Machine, using the Hampson-Linde cycle). Onnes performed an earlier resistance measuring experiment, this time with mercury, and then observed the superconductivity. Both of these seem like they would’ve been done pretty soon by someone else.
Surely others would’ve tried cooling a bunch more metals in the already ongoing quest to understand the resistance at cold temperatures, and then realized the superconductivity in some of them. Mercury, lead, and niobium superconduct at low temperatures—surely someone would’ve tried metals as obvious as mercury and lead. At the very least, observation of the superfluidity of liquid helium should’ve spurned people into cooling random stuff and seeing if anything weird happened.
I meant in terms of the way people use the word “SPR”—of course, if a linear model performs better than experts, than I would expect a linear model for the logit to as well, and if it doesn’t, that doesn’t change the point of the argument because you can just use the linear model.
It seems like you could do better with a logit model
p = logistic( \sum_i w_i c_i ) that is, logit(p) = log odds(p) = \sum_i w_i c_i
Are these also called SPR’s?
I think you could make the first theorem in the post (simplified fundamental theorem on two variables) easier to understand to a novice if you explicitly clarified that the conclusion diagram Λ′ → Λ → X is the same as Λ′ ← Λ → X by the chain rerooting rule, and perhaps use the latter diagram in the picture as it more directly makes clear the idea of mediation/inducing independence.
I also think this about the redund condition X_1 → X_2-> Λ ′ & X_2 → X_1 → Λ′. Until realizing that these diagrams were the same as X_1 ← X_2 → Λ′, the condition seemed mysterious to me, and because you didn’t describe them using the same english words, it took me a while to realize that it makes sense if I think of it as X_2 mediating between X_1 and Λ′ (so learning about X_1 doesn’t tell me anything new about Λ′) and vice versa.
I think you could make the first theorem in the post (simplified fundamental theorem on two variables) easier to understand to a novice if you explicitly clarified that the conclusion diagram Λ′ → Λ → X is the same as Λ′ ← Λ → X by the chain rerooting rule, and perhaps use the latter diagram in the picture as it more directly makes clear the idea of mediation/inducing independence.
I, too, have had the same objection you have with people that claim that the problem with intransitive preferences is that you can be money pumped, and that our real objection is just that it’d be really weird to be able to transition by stepwise preferred states and yet end up in a state dispreferred to the start (this not being a money pump because the agent can just choose not to do this).
Though, “it’s really weird” is a pretty good objection—it, in fact, would be extremely weird to have intransitive preferences, and so I think it is fine to assume that the “true” (Coherent Extrapolated Volition) preferences of e.g. humans are transitive.
You can use the intuition that a greedy optimizer shouldn’t ever end up worse than it started, even if it isn’t in the best place.
Well, since you quite literally begged the question: what are you sexual kinks?