I maintain a reading list on Goodreads. I have a personal website with some blog posts, mostly technical stuff about math research. I am also on github, twitter, and mastodon.
Eigil Rischel(Eigil Fjeldgren Rischel)
I think we’re basically in agreement here (and I think your summary of the results is fair, I was mostly pushing back on a toohyped tone coming from OpenAI, not from you)
Most of the heavy lifting in these proofs seem to be done by the Lean tactics. The comment “arguments to
nlinarith
are fully invented by our model” above a proof which is literally the single linenlinarith [sq_nonneg (b  a), sq_nonneg (c  b), sq_nonneg (c  a)]
makes me feel like they’re trying too hard to convince me this is impressive.The other proof involving multiple steps is more impressive, but this still feels like a testament to the power of “traditional” search methods for proving algebraic inequalities, rather than an impressive AI milestone. People on twitter have claimed that some of the other problems are also oneliners using existing proof assistant strategies—I find this claim totally plausible.
I would be much more impressed with an AIgenerated proof of a combinatorics or number theory IMO problem (eg problem 1 or 5 from 2021). Someone with more experience in proof assistants probably has a better intuition for which problems are hard to solve with “dumb” searching like
nlinarith
, but this is my guess.
Yes, that’s right. It’s the same basic issue that leads to the Anvil Problem
Compare two views of “the universal prior”
AIXI: The external world is a Turing machine that receives our actions as input and produces our sensory impressions as output. Our prior belief about this Turing machine should be that it’s simple, i.e. the Solomonoff prior
“The embedded prior”: The “entire” world is a sort of Turing machine, which we happen to be one component of in some sense. Our prior for this Turing machine should be that it’s simple (again, the Solomonoff prior), but we have to condition on the observation that it’s complicated enough to contain observers (“Descartes’ update”). (This is essentially Naturalized induction
I think of the difference between these as “solipsism”—AIXI gives its own existence a distinguished role in reality.
Importantly, the laws of physics seem fairly complicated in an absolute sense—clearly they require tens^{[1]} or hundreds of bits to specify. This is evidence against solipsism, because on the solipsistic prior, we expect to interact with a largely empty universe. But they don’t seem much more complicated than necessary for a universe that contains at least one observer, since the minimal source code for an observer is probably also fairly long.
More evidence against solipsism:
The laws of physics don’t seem to privilege my frame of reference. This is a pretty astounding coincidence on the solipsistic viewpoint—it means we randomly picked a universe which simulates some observerindependent laws of physics, then picks out a specific point inside it, depending on some fairly complex parameters, to show me.
When I look out into the universe external to my mind, one of the things I find there is my brain, which really seems to contain a copy of my mind. This is another pretty startling coincidence on the solipsistic prior, that the external universe being run happens to contain this kind of representation of the Cartesian observe
 ↩︎
This is obviously a very small number but I’m trying to be maximally conservative here.
Right, that’s essentially what I mean. You’re of course right that this doesn’t let you get around the existence of nonmeasurepreserving automorphisms. I guess what I’m saying is that, if you’re trying to find a prior on , you should try to think about what system of finite measurements this is idealizing, and see if you can apply a symmetry argument to those bits. Which isn’t always the case! You can only apply the principle of indifference if you’re actually indifferent. But if it’s the case that is generated in a way where “there’s no reason to suspect that any bit should be rather than , or that there should be correlations between the bits”, then it’s of course not the case that has this same property. But of course you still need to look at the actual situation to see what symmetry exists or doesn’t exist.
Haar measures are a highpowered way of doing this, I was just thinking about taking the iterated product measure of the uniform probability measure on (justified by symmetry considerations). You can of course find maps from to all sorts of spaces but it seems harder to transport symmetry considerations along these maps.
I strongly recommend Escardó′s Seemingly impossible functional programs, which constructs a function
search : ((Nat > Bool) > Bool) > (Nat > Bool)
which, given a predicate on infinite bitstrings, finds an infinite bitstring satisfying the predicate if one exists. (In other words, ifp : (Nat > Bool) > Bool
and any bitstring at all satisfiesp
, thenp (search p) == True
(Here I’m intending
Nat
to be the type of natural numbers, of course) .
Ha, I was just about to write this post. To add something, I think you can justify the uniform measure on bounded intervals of reals (for illustration purposes, say ) by the following argument: “Measuring a real number ” is obviously simply impossible if interpreted literally, containing an infinite amount of data. Instead this is supposed to be some sort of idealization of a situation where you can observe “as many bits as you want” of the binary expansion of the number (choosing another base gives the same measure). If you now apply the principle of indifference to each measured bit, you’re left with Lebesgue measure.
It’s not clear that there’s a “right” way to apply this type of thinking to produce “the correct” prior on (or or any other noncompact space.
This seems somewhat connected to this previous argument. Basically, coherent agents can be modeled as utilityoptimizers, yes, but what this really proves is that almost any behavior fits into the model “utilityoptimizer”, not that coherent agents must necessarily look like our intuitive picture of a utilityoptimizer.
Paraphrasing Rohin’s arguments somewhat, the arguments for universal convergence say something like “for “most” “natural” utility functions, optimizing that function will mean acquiring power, killing off adversaries, acquiring resources, etc”. We know that all coherent behavior comes from a utility function, but it doesn’t follow that most coherent behavior exhibits this sort of powerseeking.
My impression from skimming a few AI ETFs is that they are more or less just generic technology ETFs with different branding and a few random stocks thrown in. So they’re not catastrophically worse than the baseline “Google, Microsoft and Facebook” strategy you outlined, but I don’t think they’re better in any real way either.
This is really cool!
The example of inferring from the independence of and reminds me of some techniques discussed in Elements of Causal Inference. They discuss a few different techniques for 2variable causal inference.
One of them, which seems to be essentially analogous to this example, is that if are realvalued variables, then if the regression error (i.e for some constant ) is independent of , it’s highly likely that is downstream of . It sounds like factored sets (or some extension to capture continuousvalued variables) might be the right general framework to accommodate this class of examples.
Thanks (to both of you), this was confusing for me as well.
At least one explanation for the fact that the Fall of Rome is the only period of decline on the graph could be this: data becomes more scarce the further back in history you go. This has the effect of smoothing the historical graph as you extrapolate between the few datapoints you have. Thus the overall positive trend can more easily mask any shortterm period of decay.
Lsusr ran a survey here a little while ago, asking people for things that “almost nobody agrees with you on”. There’s a summary here
This argument proves that
Along a given timepath, the average change in entropy is zero
Over the whole space of configurations of the universe, the average difference in entropy between a given state and the next state (according to the laws of physics) is zero. (Really this should be formulated in terms of derivatives, not differences, but you get the point).
This is definitely true, and this is an inescapable feature of any (compact) dynamical system. However, somewhat paradoxically, it’s consistent with the statement that, conditional on any given (nonmaximal) level of entropy, the vast majority of states have increasing entropy.
In your timediagrams, this might look something like this:
I.e when you occasionally swing down into a somewhat lowentropy state, it’s much more likely that you’ll go back to highentropy than that you’ll go further down. So once you observe that you’re not in the maxentropy state, it’s more likely that you’ll increase than that you’ll decrease.
(It’s impossible for half of the midentropy states to continue to lowentropy states, because there are much more than twice as many midentropy states as lowentropy states, and the dynamics are measurepreserving).
This argument doesn’t work because limits don’t commute with integrals (including expected values). (Since practical situations are finite, this just tells you that the limiting situation is not a good model).
To the extent that the experiment with infinite bets makes sense, it definitely has EV 0. We can equip the space with a probability measure corresponding to independent coinflips, then describe the payout using naive EV maximization as a function  it is on the point and everywhere else. The expected value/integral of this function is zero.
EDIT: To make the “limit” thing clear, we can describe the payout after bets using naive EV maximization as a function , which is if the first values are , and otherwise. Then , and (pointwise), but .
The corresponding functions corresponding to the EV using a Kelly strategy have for all , but
 4 Mar 2021 18:48 UTC; 8 points) 's comment on A nonlogarithmic argument for Kelly by (
The source of disagreement seems to be about how to compute the EV “in the limit of infinite bets”. I.e given bets with a chance of winning each, where you triple your stake with each bet, the naive EV maximization strategy gives you a total expect value of , which is also the maximum achievable overall EV. Does this entail that the EV at infinite bets is ? No, because with probability one, you’ll lose one of the bets and end up with zero money.
I don’t find this argument for Kelly super convincing.

You can’t actually bet an infinite number of times, and any finite bound on the number of bets, even if it’s , immediately collapses back to the above situation where naive EVmaximization also maximizes the overall expected value. So this argument doesn’t actually support using Kelly over naive EV maximization in real life.

There are tons of strategies other than Kelly which achieve the goal of infinite EV in the limit. Looking at EV in the limit doesn’t give you a way of choosing between these. You can compare them over finite horizons and notice that Kelly gives you better EV than others here (maximal geometric growth rate).… but then we’re back to the fact that over finite time horizons, naive EV does even better than any of those.

I don’t wanna clutter the comments too much, so I’ll add this here: I assume there was supposed to be links to the various community discussions of Why We Sleep (hackernews, r/ssc, etc), but these are just plain text for me.
(John made a post, I’ll just post this here so others can find it: https://www.lesswrong.com/posts/Dx9LoqsEh3gHNJMDk/fixingthegoodregulatortheorem)
This seems prima facie unlikely. If you’re not worried about the risk of side effects from the “real” vaccine, why not just take it, too (since the efficacy of the homemade vaccine is far from certain)?. On the other hand, if you’re the sort of person who worries about the side effects of a vaccine that’s been through clinical trials, you’re probably not the type to brew something up in your kitchen based on a recipe that you got off the internet and snort it.
I mean, “is a large part of the state space” is basically what “high entropy” means!
For case 3, I think the right way to rule out this counterexample is the probabilistic criterion discussed by John—the vast majority of initial states for your computer don’t include a zeroday exploit and a script to automatically deploy it. The only way to make this likely is to include you programming your computer in the picture, and of course you do have a world model (without which you could not have programmed your computer)