kaarelh AT gmail DOT com
Kaarel
coming up with good ideas is very difficult as well
(and it requires good judgment, also)
I’ve only skimmed the post so the present comment could be missing the mark (sorry if so), but I think you might find it worthwhile/interesting/fun to think (afresh, in this context) about how come humans often don’t wirehead and probably wouldn’t wirehead even with much much longer lives (in particular, much longer childhoods and research careers), and whether the kind of AI that would do hard math/philosophy/tech-development/science will also be like that.[1][2]
- ↩︎
I’m not going to engage further on this here, but if you’d like to have a chat about this, feel free to dm me.
- ↩︎
I feel like clarifying that I’d inside-view say P( the future is profoundly non-human (in a bad sense) | AI (which is not pretty much a synthetic human) smarter than humans is created this century ) >0.98 despite this.
- ↩︎
i agree that most people doing “technical analysis” are doing nonsense and any particular well-known simple method does not actually work. but also clearly a very good predictor could make a lot of money just looking at the past price time series anyway
it feels to me like you are talking of two non-equivalent types of things as if they were the same. like, imo, the following are very common in competent entities: resisting attempts on one’s life, trying to become smarter, wanting to have resources (in particular, in our present context, being interested in eating the Sun), etc.. but then whether some sort of vnm-coherence arises seems like a very different question. and indeed even though i think these drives are legit, i think it’s plausible that such coherence just doesn’t arise or that thinking of the question of what valuing is like such that a tendency toward “vnm-coherence” or “goal stability” could even make sense as an option is pretty bad/confused[1].
(of course these two positions i’ve briefly stated on these two questions deserve a bunch of elaboration and justification that i have not provided here, but hopefully it is clear even without that that there are two pretty different questions here that are (at least a priori) not equivalent)
- ↩︎
briefly and vaguely, i think this could involve mistakenly imagining a growing mind meeting a fixed world, when really we will have a growing mind meeting a growing world — indeed, a world which is approximately equal to the mind itself. slightly more concretely, i think things could be more like: eg humanity has many profound projects now, and we would have many profound but currently basically unimaginable projects later, with like the effective space of options just continuing to become larger, plausibly with no meaningful sense in which there is a uniform direction in which we’re going throughout or whatever
- ↩︎
a chat with Towards_Keeperhood on what it takes for sentences/phrases/words to be meaningful
you could define “mother(x,y)” as “x gave birth to y”, and then “gave birth” as some more precise cluster of observations, which eventually need to be able to be identified from visual inputs
Kaarel:
if i should read this as talking about a translation of “x is the mother of y”, then imo this is a bad idea.
in particular, i think there is the following issue with this: saying which observations “x gave birth to y” corresponds to intuitively itself requires appealing to a bunch of other understanding. it’s like: sure, your understanding can be used to create visual anticipations, but it’s not true that any single sentence alone could be translated into visual anticipations — to get a typical visual anticipation, you need to rely on some larger segment of your understanding. a standard example here is “the speed of light in vacuum is 3*10^8 m/s” creating visual anticipations in some experimental setups, but being able to derive those visual anticipations depends on a lot of further facts about how to create a vacuum and properties of mirrors and interferometers and so on (and this is just for one particular setup — if we really mean to make a universally quantified statement, then getting the observation sentences can easily end up requiring basically all of our understanding). and it seems silly to think that all this crazy stuff was already there in what you meant when you said “the speed of light in vacuum is m/s”. one concrete reason why you don’t want this sentence to just mean some crazy AND over observation sentences or whatever is because you could be wrong about how some interferometer works and then you’d want it to correspond to different observation sentences
this is roughly https://en.wikipedia.org/wiki/Confirmation_holism as a counter to https://en.wikipedia.org/wiki/Verificationism
that said, i think there is also something wrong with some very strong version of holism: it’s not really like our understanding is this unitary thing that only outputs visual anticipations using all the parts together, either — the real correspondence is somewhat more granular than that
TK:
On reflection, I think my “mother” example was pretty sloppy and perhaps confusing. I agree that often quite a lot of our knowledge is needed to ground a statement in anticipations. And yeah actually it doesn’t always ground out in that, e.g. for parsing the meaning of counterfactuals. (See “Mixed Reference: The great reductionist project”.)
K:
i wouldn’t say a sentence is grounded in anticipations with a lot of our knowledge, because that makes it sound like in the above example, “the speed of light is m/s” is somehow privileged compared to our understanding of mirrors and interferometers even though it’s just all used together to create anticipations; i’d instead maybe just say that a bunch of our knowledge together can create a visual anticipation
TK:
thx. i wanted to reply sth like “a true statement can either be tautological (e.g. math theorems) or empirical, and for it to be an empirical truth there needs to be some entanglement between your belief and reality, and entanglement happens through sensory anticipations. so i feel fine with saying that the sentence ‘the speed of light is m/s’ still needs to be grounded in sensory anticipations”. but i notice that the way i would use “grounded” here is different from the way I did in my previous comment, so perhaps there are two different concepts that need to be disentangled.
K:
here’s one thing in this vicinity that i’m sympathetic to: we should have as a criterion on our words, concepts, sentences, thoughts, etc. that they play some role in determining our actions; if some mental element is somehow completely disconnected from our lives, then i’d be suspicious of it. (and things can be connected to action via creating visual anticipations, but also without doing that.)
that said, i think it can totally be good to be doing some thinking with no clear prior sense about how it could be connected to action (or prediction) — eg doing some crazy higher math can be good, imagining some crazy fictional worlds can be good, games various crazy artists and artistic communities are playing can be good, even crazy stuff religious groups are up to can be good. also, i think (thought-)actions in these crazy domains can themselves be actions one can reasonably be interested in supporting/determining, so this version of entanglement with action is really a very weak criterion
generally it is useful to be able to “run various crazy programs”, but given this, it seems obvious that not all variables in all useful programs are going to satisfy any such criterion of meaningfulness? like, they can in general just be some arbitrary crazy things (like, imagine some memory bit in my laptop or whatever) playing some arbitrary crazy role in some context, and this is fine
and similarly for language: we can have some words or sentences playing some useful role without satisfying any strict meaningfulness criterion (beyond maybe just having some relation to actions or anticipations which can be of basically arbitrary form)
a different point: in human thinking, the way “2+2=4” is related to visual anticipations is very similar to the way “the speed of light is m/s” is related to visual anticipations
TK:
Thanks!
I agree that e.g. imagining fictional worlds like HPMoR can be useful.
I think I want to expand my notion of “tautological statements” to include statements like “In the HPMoR universe, X happens”. You can also pick any empirical truth “X” and turn it into a tautological one by saying “In our universe, X”. Though I agree it seems a bit weird.
Basically, mathematics tells you what’s true in all possible worlds, so from mathematics alone you never know in which world you may be in. So if you want to say something that’s true about your world specifically (but not across all possible worlds), you need some observations to pin down what world you’re in.
I think this distinction is what Eliezer means in his highly advanced epistemology sequence when he uses “logical pinpointing” and “physical pinpointing”.
You can also have a combination of the two. (I’d say as soon as some physical pinpointing is involved I’d call it an empirical fact.)
Commented about that. (I actually changed my model slightly): https://www.lesswrong.com/posts/bTsiPnFndZeqTnWpu/mixed-reference-the-great-reductionist-project?commentId=HuE78qSkZJ9MxBC8p
K:
the imo most important thing in my messages above is the argument against [any criterion of meaningfulness which is like what you’re trying to state] being reasonable
in brief, because it’s just useful to be allowed to have arbitrary “variables” in “one’s mental circuits”
just like there’s no such meaningfulness criterion on a bit in your laptop’s memory
if you want to see from the outside the way the bit is “connected to the world”, one thing you could do is to say that the bit is 0 in worlds which are such-and-such and 1 in worlds which are such-and-such, or, if you have a sense of what the laptop is supposed to be doing, you could say in which worlds the bit “should be 0” and in which worlds the bit “should be 1″, but it’s not like anything like this crazy god’s eye view picture is (or even could explicitly be) present inside the laptop
our sentences and terms don’t have to have meanings “grounded in visual anticipations”, just like the bit in the laptop doesn’t
except perhaps in the very weak sense that it should be possible for a sentence to be involved in determining actions (or anticipations) in some potentially arbitrarily remote way
the following is mostly a side point: one problem with seeing from the inside what your bits (words, sentences) are doing (especially in the context of pushing the frontier of science, math, philosophy, tech, or generally doing anything you don’t know how to do yet, but actually also just basically all the time) is that you need to be open to using your bits in new ways; the context in which you are using your bits usually isn’t clear to you
btw, this is a sort of minor point but i’m stating it because i’m hoping it might contribute to pushing you out of a broader imo incorrect view: even when one is stating formal mathematical statements, one should be allowed to state sentences with no regard for whether they are tautologies/contradictions (that is, provable/disprovable) or not — ie, one should be allowed to state undecidable sentences, right? eg you should be allowed to state a proof that has the structure “if P, then blabla, so Q; but if not-P, then other-blabla, but then also Q; therefore, Q”, without having to pay any attention to whether P itself is tautological/contradictory or undecidable
so, if what you want to do with your criterion of meaningfulness involves banning saying sentences which are not “meaningful”, then even in formal math, you should consider non-tautological/contradictory sentences meaningful. (if you don’t want to ban the “meaningless” sentences, then idk what we’re even supposed to be doing with this notion of meaningfulness.)
TK:
Thx. I definitely agree one should be able to state all mathematical statements (including undecidable ones), and that for proofs you shouldn’t need to pay attention to whether a statement is undecidable or not. (I’m having sorta constructivist tendencies though, where “if P, then blabla, so Q; but if not-P, then other-blabla, but then also Q; therefore, Q” wouldn’t be a valid proof because we don’t assume the law of excluded middle.)
Ok yeah thx I think the way I previously used “meaningfully” was pretty confused. I guess I don’t really want to rule out any sentences people use.
I think sth is not meaningful if there’s no connection between a belief to your main belief pool. So “a puffy is a flippo” is perhaps not meaningful to you because those concepts don’t relate to anything else you know? (But that’s a different kind of meaningful from what errors people mostly make.)
K:
yea. tho then we could involve more sentences about puffies and flippos and start playing some game involving saying/thinking those sentences and then that could be fun/useful/whatever
TK:
maybe. idk.
I think it’s plausible that a system which is smarter than humans/humanity (and distinct and separate from humans/humanity) should just never be created, and I’m inside-view almost certain it’d be profoundly bad if such a system were created any time soon. But I think I’ll disagree with like basically anyone on a lot of important stuff around this matter, so it just seems really difficult for anyone to be such that I’d feel like really endorsing them on this matter?[1] That said, my guess is that PauseAI is net positive, tho I haven’t thought about this that much :)
Thank you for the comment!
First, I’d like to clear up a few things:
I do think that making an “approximate synthetic 2025 human newborn/fetus (mind)” that can be run on a server having 100x usual human thinking speed is almost certainly a finite problem, and one might get there by figuring out what structures are there in a fetus/newborn precisely enough, and it plausibly makes sense to focus particularly on structures which are more relevant to learning. If one were to pull this off, one might then further be able to have these synthetic fetuses grow up quickly into fairly normal humans and have them do stuff which ends the present period of (imo) acute x-risk. (And the development of thought continues after that, I think; I’ll say more that relates to this later.) While I do say in my post that making mind uploads is a finite problem, it might have been good to state also (or more precisely) that this type of thing is finite.
I certainly think that one can make a finite system such that one can reasonably think that it will start a process that does very much — like, eats the Sun, etc.. Indeed, I think it’s likely that by default humanity would unfortunately start a process that gets the Sun eaten this century. I think it is plausible there will be some people who will be reasonable in predicting pretty strongly that that particular process will get the Sun eaten. I think various claims about humans understanding some stuff about that process are less clear, though there is surely some hypothetical entity that could pretty deeply understand the development of that process up to the point where it eats the Sun.
Some things in my notes were written mostly with an [agent foundations]y interlocutor in mind, and I’m realizing now that some of these things could also be read as if I had some different interlocutor in mind, and that some points probably seem more incongruous if read this way.
I’ll now proceed to potential disagreements.
But there’s something else, which is a very finite legible learning algorithm that can automatically find all those things—the object-level stuff and the thinking strategies at all levels. The genome builds such an algorithm into the human brain. And it seems to work! I don’t think there’s any math that is forever beyond humans, or if it is, it would be for humdrum reasons like “not enough neurons to hold that much complexity in your head at once”.
Some ways I disagree or think this is/involves a bad framing:
If we focus on math and try to ask some concrete question, instead of asking stuff like “can the system eventually prove anything?”, I think it is much more appropriate to ask stuff like “how quickly can the system prove stuff?”. Like, brute-force searching all strings for being a proof of a particular statement can eventually prove any provable statement, but we obviously wouldn’t want to say that this brute-force searcher is “generally intelligent”. Very relatedly, I think that “is there any math which is technically beyond a human?” is not a good question to be asking here.
The blind idiot god that pretty much cannot even invent wheels (ie evolution) obviously did not put anything approaching the Ultimate Formula for getting far in math (or for doing anything complicated, really) inside humans (even after conditioning on specification complexity and computational resources or whatever), and especially not in an “unfolded form”[1], right? Any rich endeavor is done by present humans in a profoundly stupid way, right?[2] Humanity sorta manages to do math, but this seems like a very weak reason to think that [humans have]/[humanity has] anything remotely approaching an “ultimate learning algorithm” for doing math?[3]
The structures in a newborn [that make it so that in the right context the newborn grows into a person who (say) pushes the frontier of human understanding forward] and [which participate in them pushing the frontier of human understanding forward] are probably already really complicated, right? Like, there’s already a great variety of “ideas” involved in the “learning-relevant structures” of a fetus?
I think that the framing that there is a given fixed “learning algorithm” in a newborn, such that if one knew it, one would be most of the way there to understanding human learning, is unfortunate. (Well, this comes with the caveat that it depends on what one wants from this “understanding of human learning” — e.g., it is probably fine to think this if one only wants to use this understanding to make a synthetic newborn.) In brief, I’d say “gaining thinking-components is a rich thing, much like gaining technologies more generally; our ability to gain thinking-components is developing, just like our ability to gain technologies”, and then I’d point one to Note 3 and Note 4 for more on this.
I want to say more in response to this view/framing that some sort of “human learning algorithm” is already there in a newborn, even in the context of just the learning that a single individual human is doing. Like, a human is also importantly gaining components/methods/ideas for learning, right? For example, language is centrally involved in human learning, and language isn’t there in a fetus (though there are things in a newborn which create a capacity for gaining language, yes). I feel like you might want to say “who cares — there is a preserved learning algorithm in the brain of a fetus/newborn anyway”. And while I agree that there are very important things in the brain which are centrally involved in learning and which are fairly unchanged during development, I don’t understand what [the special significance of these over various things gained later] is which makes it reasonable to say that a human has a given fixed “learning algorithm”. An analogy: Someone could try to explain structure-gaining by telling me “take a random init of a universe with such and such laws (and look along a random branch of the wavefunction[4]) — in there, you will probably eventually see a lot of structures being created” — let’s assume that this is set up such that one in fact probably gets atoms and galaxies and solar systems and life and primitive entities doing math and reflecting (imo etc.). But this is obviously a highly unsatisfying “explanation” of structure-gaining! I wanted to know why/how protons and atoms and molecules form and why/how galaxies and stars and black holes form, etc.. I wanted to know about evolution, and about how primitive entities inventing/discovering mathematical concepts could work, and imo many other things! Really, this didn’t do very much beyond just telling me “just consider all possible universes — somewhere in there, structures occur”! Like, yes, I’ve been given a context in which structure-gaining happens, but this does very little to help me make sense of structure-gaining. I’d guess that knowing the “primordial human learning algorithm” which is there in a fetus is significantly more like knowing the laws of physics than your comment makes it out to be. If it’s not like that, I would like to understand why it’s not like that — I’d like to understand why a fetus’s learning-structures really deserve to be considered the “human learning algorithm”, as opposed to being seen as just providing a context in which wild structure-gaining can occur and playing some important role in this wild structure-gaining (for now).
to conclude: It currently seems unlikely to me that knowing a newborn’s “primordial learning algorithm” would get me close to understanding human learning — in particular, it seems unlikely that it would get me close understanding how humanity gains scientific/mathematical/philosophical understanding. Also, it seems really unlikely that knowing this “primordial learning algorithm” would get me close to understanding learning/technology-making/mathematical-understanding-gaining in general.[5]
- ↩︎
like, such that it is already there in a fetus/newborn and doesn’t have to be gained/built
- ↩︎
I think present humans have much more for doing math than what is “directly given” by evolution to present fetuses, but still.
- ↩︎
One attempt to counter this: “but humans could reprogram into basically anything, including whatever better system for doing math there is!”. But conditional on this working out, the appeal of the claim that fetuses already have a load-bearing fixed “learning algorithm” is also defeated, so this counterargument wouldn’t actually work in the present context even if this claim were true.
- ↩︎
let’s assume this makes sense
- ↩︎
That said, I could see an argument for a good chunk of the learning that most current humans are doing being pretty close to gaining thinking-structures which other people already have, from other people that already have them, and there is definitely something finite in this vicinity — like, some kind of pure copying should be finite (though the things humans are doing in this vicinity are of course more complicated than pure copying, there are complications with making sense of “pure copying” in this context, and also humans suck immensely (compared to what’s possible) even at “pure copying”).
Thank you for your comment!
What you’re saying seems more galaxy-brained than what I was saying in my notes, and I’m probably not understanding it well. Maybe I’ll try to just briefly (re)state some of my claims that seem most relevant to what you’re saying here (with not much justification for my claims provided in my present comment, but there’s some in the post), and then if it looks to you like I’m missing your point, feel very free to tell me that and I can then put some additional effort into understanding you.
So, first, math is this richly infinite thing that will never be mostly done.
If one is a certain kind of guy doing alignment, one might hope that one could understand how e.g. mathematical thinking works (or could work), and then make like an explicit math AI one can understand (one would probably really want this for science or for doing stuff in general[1], but a fortiori one would need to be able to do this for math).[2]
But oops, this is very cursed, because thinking is an infinitely rich thing, like math!
I think a core idea here is that thinking is a technological thing. Like, one aim of notes 1–6 (and especially 3 and 4) is to “reprogram” the reader into thinking this way about thinking. That is, the point is to reprogram the reader away from sth like “Oh, how does thinking, the definite thing, work? Yea, this is an interesting puzzle that we haven’t quite cracked yet. You probably have to, like, combine logical deduction with some probability stuff or something, and then like also the right decision theory (which still requires some work but we’re getting there), and then maybe a few other components that we’re missing, but bro we will totally get there with a few ideas about how to add search heuristics, or once we’ve figured out a few more details about how abstraction works, or something.”
Like, a core intuition is to think of thinking like one would think of, like, the totality of humanity’s activities, or about human technology. There’s a great deal going on! It’s a developing sort of thing! It’s the sort of thing where you need/want to have genuinely new inventions! There is a rich variety of useful thinking-structures, just like there is a rich variety of useful technological devices/components, just like there is a rich variety of mathematical things!
Given this, thinking starts to look a lot like math — in particular, the endeavor to understand thinking will probably always be mostly unfinished. It’s the sort of thing that calls for an infinite library of textbooks to be written.
In alignment, we’re faced with an infinitely rich domain — of ways to think, or technologies/components/ideas for thinking, or something. This infinitely rich domain again calls for textbooks to keep being written as one proceeds.
Also, the thing/thinker/thought writing these textbooks will itself need to be rich and developing as well, just like the math AI will need to be rich and developing.
Generally, you can go meta more times, but on each step, you’ll just be asking “how do I think about this infinitely rich domain?”, answering which will again be an infinite endeavor.
You could also try to make sense of climbing to higher infinite ordinal levels, I guess?
(* Also, there’s something further to be said also about how [[doing math] and [thinking about how one should do math]] are not that separate.)
I’m at like inside-view p=0.93 that the above presents the right vibe to have about thinking (like, maybe genuinely about its potential development forever, but if it’s like technically only the right vibe wrt the next years of thinking (at a 2024 rate) or something, then I’m still going to count that as thinking having this infinitary vibe for our purposes).[3]
However, the question about whether one can in principle make a math AI that is in some sense explicit/understandable anyway (that in fact proves impressive theorems with a non-galactic amount of compute) is less clear. Making progress on this question might require us to clarify what we want to mean by “explicit/understandable”. We could get criteria on this notion from thinking through what we want from it in the context of making an explicit/understandable AI that makes mind uploads (and “does nothing else”). I say some more stuff about this question in 4.4.
- ↩︎
if one is an imo complete lunatic :), one is hopeful about getting this so that one can make an AI sovereign with “the right utility function” that “makes there be a good future spacetime block”; if one is an imo less complete lunatic :), one is hopeful about getting this so that one can make mind uploads and have the mind uploads take over the world or something
- ↩︎
to clarify: I actually tend to like researchers with this property much more than I like basically any other “researchers doing AI alignment” (even though researchers with this property are imo engaged in a contemporary form of alchemy), and I can feel the pull of this kind of direction pretty strongly myself (also, even if the direction is confused, it still seems like an excellent thing to work on to understand stuff better). I’m criticizing researchers with this property not because I consider them particularly confused/wrong compared to others, but in part because I instead consider them sufficiently reasonable/right to be worth engaging with (and because I wanted to think through these questions for myself)!
- ↩︎
I’m saying this because you ask me about my certainty in something vaguely like this — but I’m aware I might be answering the wrong question here. Feel free to try to clarify the question if so.
An Advent of Thought
not really an answer but i wanted to communicate that the vibe of this question feels off to me because: surely one’s criteria on what to be up to are/[should be] rich and developing. that is, i think things are more like: currently i have some projects i’m working on and other things i’m up to, and then later i’d maybe decide to work on some new projects and be up to some new things, and i’d expect to encounter many choices on the way (in particular, having to do with whom to become) that i’d want to think about in part as they come up. should i study A or B? should i start job X? should i 2x my neuron count using such and such a future method? these questions call for a bunch of thought (of the kind given to them in usual circumstances, say), and i would usually not want to be making these decisions according to any criterion i could articulate ahead of time (though it could be helpful to tentatively state some general principles like “i should be learning” and “i shouldn’t do psychedelics”, but these obviously aren’t supposed to add up to some ultimate self-contained criterion on a good life)
make humans (who are) better at thinking (imo maybe like continuing this way forever, not until humans can “solve AI alignment”)
think well. do math, philosophy, etc.. learn stuff. become better at thinking
live a good life
A few quick observations (each with like confidence; I won’t provide detailed arguments atm, but feel free to LW-msg me for more details):
Any finite number of iterates just gives you the solomonoff distribution up to at most a const multiplicative difference (with the const depending on how many iterates you do). My other points will be about the limit as we iterate many times.
The quines will have mass at least their prior, upweighted by some const because of programs which do not produce an infinite output string. They will generally have more mass than that, and some will gain mass by a larger multiplicative factor than others, but idk how to say something nice about this further.
Yes, you can have quine-cycles. Relevant tho not exactly this: https://github.com/mame/quine-relay
As you do more and more iterates, there’s not convergence to a stationary distribution, at least in total variation distance. One reason is that you can write a quine which adds a string to itself (and then adds the same string again next time, and so on)[1], creating “a way for a finite chunk of probability to escape to infinity”. So yes, some mass diverges.
Quine-cycles imply (or at least very strongly suggest) probabilities also do not converge pointwise.
What about pointwise convergence when we also average over the number of iterates? It seems plausible you get convergence then, but not sure (and not sure if this would be an interesting claim). It would be true if we could somehow think of the problem as living on a directed graph with countably many vertices, but idk how to do that atm.
There are many different stationary distributions — e.g. you could choose any distribution on the quines.
- ↩︎
a construction from o3-mini-high: https://colab.research.google.com/drive/1kIGCiDzWT3guCskgmjX5oNoYxsImQre-?usp=sharing
I think AlphaProof is pretty far from being just RL from scratch:
they use a pretrained language model; I think the model is trained on human math in particular ( https://archive.is/Cwngq#selection-1257.0-1272.0:~:text=Dr. Hubert’s team,frequency was reduced. )
do we have good reason to think they didn’t specifically train it on human lean proofs? it seems plausible to me that they did but idk
the curriculum of human problems teaches it human tricks
lean sorta “knows” a bunch of human tricks
We could argue about whether AlphaProof “is mostly human imitation or mostly RL”, but I feel like it’s pretty clear that it’s more analogous to AlphaGo than to AlphaZero.
(a relevant thread: https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce?commentId=ZKuABGnKf7v35F5gp )
I didn’t express this clearly, but yea I meant no pretraining on human text at all, and also nothing computer-generated which “uses human mathematical ideas” (beyond what is in base ZFC), but I’d probably allow something like the synthetic data generation used for AlphaGeometry (Fig. 3) except in base ZFC and giving away very little human math inside the deduction engine. I agree this would be very crazy to see. The version with pretraining on non-mathy text is also interesting and would still be totally crazy to see. I agree it would probably imply your “come up with interesting math concepts”. But I wouldn’t be surprised if like of the people on LW who think A[G/S]I happens in like years thought that my thing could totally happen in 2025 if the labs were aiming for it (though they might not expect the labs to aim for it), with your things plausibly happening later. E.g. maybe such a person would think “AlphaProof is already mostly RL/search and one could replicate its performance soon without human data, and anyway, AlphaGeometry already pretty much did this for geometry (and AlphaZero did it for chess)” and “some RL+search+self-play thing could get to solving major open problems in math in 2 years, and plausibly at that point human data isn’t so critical, and IMO problems are easier than major open problems, so plausibly some such thing gets to IMO problems in 1 year”. But also idk maybe this doesn’t hang together enough for such people to exist. I wonder if one can use this kind of idea to get a different operationalization with parties interested in taking each side though. Like, maybe whether such a system would prove Cantor’s theorem (stated in base ZFC) (imo this would still be pretty crazy to see)? Or whether such a system would get to IMO combos relying moderately less on human data?
¿ thoughts on the following:
solving >95% of IMO problems while never seeing any human proofs, problems, or math libraries (before being given IMO problems in base ZFC at test time). like alphaproof except not starting from a pretrained language model and without having a curriculum of human problems and in base ZFC with no given libraries (instead of being in lean), and getting to IMO combos
- Feb 12, 2025, 2:26 AM; 6 points) 's comment on Jesse Hoogland’s Shortform by (
some afaik-open problems relating to bridging parametrized bayes with sth like solomonoff induction
I think that for each NN architecture+prior+task/loss, conditioning the initialization prior on train data (or doing some other bayesian thing) is typically basically a completely different learning algorithm than (S)GD-learning, because local learning is a very different thing, which is one reason I doubt the story in the slides as an explanation of generalization in deep learning[1].[2] But setting this aside (though I will touch on it again briefly in the last point I make below), I agree it would be cool to have a story connecting the parametrized bayesian thing to something like Solomonoff induction. Here’s an outline of an attempt to give a more precise story extending the one in Lucius’s slides, with a few afaik-open problems:
Let’s focus on boolean functions (because that’s easy to think about — but feel free to make a different choice). Let’s take a learner to be shown certain input-output pairs (that’s “training it”), and having to predict outputs on new inputs (that’s “test time”). Let’s say we’re interested in understanding something about which learning setups “generalize well” to these new inputs.
What should we mean by “generalizing well” in this context? This isn’t so clear to me — we could e.g. ask that it does well on problems “like this” which come up in practice, but to solve such problems, one would want to look at what situation gave us the problem and so on, which doesn’t seem like the kind of data we want to include in the problem setup here; we could imagine simply removing such data and asking for something that would work well in practice, but this still doesn’t seem like such a clean criterion.
But anyway, the following seems like a reasonable Solomonoff-like thing:
There’s some complexity (i.e., size/[description length], probably) prior on boolean circuits. There can be multiple reasonable choices of [types of circuits admitted] and/or [description language] giving probably genuinely different priors here, but make some choice (it seems fine to make whatever reasonable choice which will fit best with the later parts of the story we’re attempting to build).
Think of all the outputs (i.e. train and test) as being generated by taking a circuit from this prior and running the inputs through it.
To predict outputs on new inputs, just do the bayesian thing (ie condition the induced prior on functions on all the outputs you’ve seen).
My suggestion is that to explain why another learning setup (for boolean functions) has good generalization properties, we could be sort of happy with building a bridge between it and the above simplicity-prior-circuit-solomonoff thing. (This could let us bypass having to further specify what it is to generalize well.)[3]
One key step in the present attempt at building a bridge from NN-bayes to simplicity-prior-circuit-solomonoff is to get from simplicity-prior-circuit-solomonoff to a setup with a uniform prior over circuits — the story would like to say that instead of picking circuits from a simplicity prior, you can pick circuits uniformly at random from among all circuits of up to a certain size. The first main afaik-open problem I want to suggest is to actually work out this step: to provide a precise setup where the uniform prior on boolean circuits up to a certain size is like the simplicity prior on boolean circuits (and to work out the correspondence). (It could also be interesting and [sufficient for building a bridge] to argue that the uniform prior on boolean circuits has good generalization properties in some other way.) I haven’t thought about this that much, but my initial sense is that this could totally be false unless one is careful about getting the right setup (for example: given inputs-outputs from a particular boolean function with a small circuit, maybe it would work up to a certain upper bound on the size of the circuits on which we have a uniform prior, and then stop working; and/or maybe it depends more precisely on our [types of circuits admitted] and/or [description language]). (I know there is this story with programs, but idk how to get such a correspondence for circuits from that, and the correspondence for circuits seems like what we actually need/want.)
The second afaik-open problem I’m suggesting is to figure out in much more detail how to get from e.g. the MLP with a certain prior to boolean circuits with a uniform prior.
One reason I’m stressing these afaik-open problems (particularly the second one) is that I’m pretty sure many parametrized bayesian setups do not in fact give good generalization behavior — one probably needs some further things (about the architecture+prior, given the task) to go right to get good generalization (in fact, I’d guess that it’s “rare” to get good generalization without these further unclear hyperparams taking on the right values), and one’s attempt at building a bridge should probably make contact with these further things (so as to not be “explaining” a falsehood).
One interesting example is given by MLPs in the NN gaussian process limit (i.e. a certain kind of initialization + taking the width to infinity) learning boolean functions (edit: I’ve realized I should clarify that I’m (somewhat roughly speaking) assuming the convention, not the convention), which I think ends up being equivalent to kernel ridge regression with the fourier basis on boolean functions as the kernel features (with certain weights depending on the size of the XOR), which I think doesn’t have great generalization properties — in particular, it’s quite unlike simplicity-prior-circuit-solomonoff, and it’s probably fair to think of it as doing sth more like a polyfit in some sense. I think this also happens for the NTK, btw. (But I should say I’m going off some only loosely figured out calculations (joint with Dmitry Vaintrob and o1-preview) here, so there’s a real chance I’m wrong about this example and you shouldn’t completely trust me on it currently.) But I’d guess that deep learning can do somewhat better than this. (speculation: Maybe a major role in getting bad generalization here is played by the NNGP and NTK not “learning intermediate variables”, preventing any analogy with boolean circuits with some depth going through, whereas deep learning can learn intermediate variables to some extent.) So if we want to have a correct solomonoff story which explains better generalization behavior than that of this probably fairly stupid kernel thing, then we would probably want the story to make some distinction which prevents it from also applying in this NNGP limit. (Anyway, even if I’m wrong about the NNGP case, I’d guess that most setups provide examples of fairly poor generalization, so one probably really needn’t appeal to NNGP calculations to make this point.)
Separately from the above bridge attempt, it is not at all obvious to me that parametrized bayes in fact has such good generalization behavior at all (i.e., “at least as good as deep learning”, whatever that means, let’s say)[4]; here’s some messages on this topic I sent to [the group chat in which the posted discussion happened] later:
“i’d be interested in hearing your reasons to think that NN-parametrized bayesian inference with a prior given by canonical initialization randomization (or some other reasonable prior) generalizes well (for eg canonical ML tasks or boolean functions), if you think it does — this isn’t so clear to me at all
some practical SGD-NNs generalize decently, but that’s imo a sufficiently different learning process to give little evidence about the bayesian case (but i’m open to further discussion of this). i have some vague sense that the bayesian thing should be better than SGD, but idk if i actually have good reason to believe this?
i assume that there are some other practical ML things inspired by bayes which generalize decently but it seems plausible that those are still pretty local so pretty far from actual bayes and maybe even closer to SGD than to bayes, tho idk what i should precisely mean by that. but eg it seems plausible from 3 min of thinking that some MCMC (eg SGLD) setup with a non-galactic amount of time on a NN of practical size would basically walk from init to a local likelihood max and not escape it in time, which sounds a lot more like SGD than like bayes (but idk maybe some step size scheduling makes the mixing time non-galactic in some interesting case somehow, or if it doesn’t actually do that maybe it can give a fine approximation of the posterior in some other practical sense anyway? seems tough). i haven’t thought about variational inference much tho — maybe there’s something practical which is more like bayes here and we could get some relevant evidence from that
maybe there’s some obvious answer and i’m being stupid here, idk :)
one could also directly appeal to the uniformly random program analogy but the current version of that imo doesn’t remotely constitute sufficiently good reason to think that bayesian NNs generalize well on its own”
(edit: this comment suggests https://arxiv.org/pdf/2002.02405 as evidence that bayes-NNs generalize worse than SGD-NNs. but idk — I haven’t looked at the paper yet — ie no endorsement of it one way or the other from me atm)
- ↩︎
to the extent that deep learning in fact exhibits good generalization, which is probably a very small extent compared to sth like Solomonoff induction, and this has to do with some stuff I talked about in my messages in the post above; but I digress
- ↩︎
I also think that different architecture+prior+task/loss choices probably give many substantially-differently-behaved learning setups, deserving somewhat separate explanations of generalization, for both bayes and SGD.
- ↩︎
edit: Instead of doing this thing with circuits, you could get an alternative “principled generalization baseline/ceiling” from doing the same thing with programs instead (i.e., have a complexity prior on turing machines and condition it on seen input-output pairs), which I think ends up being equivalent (up to a probably-in-some-sense-small term) to using the kolmogorov complexities of these functions (thought of “extensionally” as strings, ie just listing outputs in some canonical order (different choices of canonical order should again give the same complexities (up to a probably-in-some-sense-small term))). While this is probably a more standard choice historically, it seems worse for our purposes given that (1) it would probably be strictly harder to build a bridge from NNs to it (and there probably just isn’t any NNs <-> programs bridge which is as precise as the NNs <-> circuits bridge we might hope to build, given that NNs are already circuity things and it’s easy to have a small program for a function without having a small circuit for it (as the small program could run for a long time)), and (2) it’s imo plausible that some variant of the circuit prior is “philosophically/physically more correct” than the program prior, though this is less clear than the first point.
- ↩︎
to be clear: I’m not claiming it doesn’t have good generalization behavior — instead, I lack good evidence/reason to think it does or doesn’t and feel like I don’t know
Deep Learning is cheap Solomonoff induction?
you say “Human ingenuity is irrelevant. Lots of people believe they know the one last piece of the puzzle to get AGI, but I increasingly expect the missing pieces to be too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments.” and you link https://tsvibt.blogspot.com/2024/04/koan-divining-alien-datastructures-from.html for “too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments”
i feel like that post and that statement are in contradiction/tension or at best orthogonal
there’s imo probably not any (even-nearly-implementable) ceiling for basically any rich (thinking-)skill at all[1] — no cognitive system will ever be well-thought-of as getting close to a ceiling at such a skill — it’s always possible to do any rich skill very much better (I mean these things for finite minds in general, but also when restricting the scope to current humans)
(that said, (1) of course, it is common for people to become better at particular skills up to some time and to become worse later, but i think this has nothing to do with having reached some principled ceiling; (2) also, we could perhaps eg try to talk about ‘the artifact that takes at most bits to specify (in some specification-language) which figures out units of math the quickest (for some sufficiently large compared to )’, but even if we could make sense of that, it wouldn’t be right to think of it as being at some math skill ceiling to begin with, because it will probably very quickly change very much about its thinking (i.e. reprogram itself, imo plausibly indefinitely many times, including indefinitely many times in important ways, until the heat death of the universe or whatever); (3) i admit that there can be some purposes for which there is an appropriate way to measure goodness at some rich skill with a score in , and for such a purpose potential goodness at even a rich skill is of course appropriate to consider bounded and optimal performance might be rightly said to be approachable, but this somehow feels not-that-relevant in the present context)
- ↩︎
i’ll try to get away with not being very clear about what i mean by a ‘rich (thinking-)skill’ except that it has to do with having a rich domain (the domain either effectively presenting any sufficiently rich set of mathematical questions as problems or relating richly to humans, or in particular just to yourself, usually suffices) and i would include all the examples you give
- ↩︎
for ideas which are “big enough”, this is just false, right? for example, so far, no LLM has generated a proof of an interesting conjecture in math