I’m a last-year PhD student at the University of Amsterdam working on AI Safety and Alignment, and specifically safety risks of Reinforcement Learning from Human Feedback (RLHF). Previously, I also worked on abstract multivariate information theory and equivariant deep learning. https://langleon.github.io/
Leon Lang
Good idea, I now added the following to the opening paragraphs of the section doing the comparisons:
Importantly, due to Theorem 4, this means that the Solomonoff prior and a priori prior lead up to a constant to the same predictions on sequences. The advantages of the priors that we analyze are thus not statements about their induced predictive distributions.
I wrote a post incorporating these thoughts now.
I answered in the parallel thread, which is probably going down to the crux now. To add a few more points:
The prior matters for the Solomonoff bound, see Theorem 5. (Tbc., the true value of the prediction error is the same irrespective of the prior, but the bound we can prove differs)
I think different priors have different aesthetics. Choosing a prior because it gives you a nice result (i.e., Solomonoff prior) feels different from choosing it because it’s a priori correct (like the a priori prior in this post). to me, aesthetics matter.
Okay, I think I overstated the extent to which the difference in priors matters in the previous comments and crossed out “practical”.
Basically, I was right that the prior that gives 100% on cannot update, it gives all its weight to no matter how much data comes in. However, itself can update with more data and shift between and .
I can see that this feels perhaps very syntactic, but in my mind the two priors still feel different. One of them is saying “The world first samples a bit indicating whether the world will continue with world 0 or world 1”, and the other one is saying “I am uncertain on whether we live in world 0 or world 1″.
Yes. There are lots of different settings one could consider, e.g.:
Finite strings
Infinite strings
Functions
LSCSMs
For all of these cases, one can compare different notions of complexity (plain K-complexity, prefix complexity, monotone complexity, if applicable) with algorithmic probability. My sense is that the correspondence is only exact for universal prefix machines and finite strings, but I didn’t consider all settings.
It’s also useful to emphasize why even if the mixtures are the same, having different priors can make a
practicaldifference. E.g., imagine that in the example above we had one prior giving 100% weight to , and another prior giving 50% weight to each of and . They give the same mixture, but the first prior can’t update, and the second prior can!
Well, their induced mixture distributions are the same up to a constant, but the priors on hypotheses are different. I’m not sure if you consider the difference “relevant”, perhaps you only care about the induced mixture distribution?
To make a simple example: Assume there were only three Turing machines , , and . Assume that and . Let , and be the LSCSMs induced by , , and . Notice that is a mixture of and : .
Let be the mixture distribution given as Then clearly, is also represented as . My viewpoint is that the prior distributions giving weight to each of the three hypotheses is different from the one giving weight to each of and , even if their mixture distributions are exactly the same.
And this is exactly the situation we’re in with the true mixture distribution from the post. Some of the LSCSMs in the mixture are given by for a separate universal monotone Turing machine, which means that is itself a mixture of all LSCSMs. Any such mixtures in the LSCSMs allow to redistribute the prior weight from this LSCSM to all others, without affecting the mixture in any way.
This is also related to what makes a prior based on Kolmogorov complexity ultimately so arbitrary: We could have chosen just about anything and it would still essentially sum to . A posteriori the Kolmogorov complexity then has some mathematical advantages as outlined in the post, however.
I’m confused. Isn’t one of the standard justification for the Solomonoff prior that you can get it without talking about K-complexity, just by assuming a uniform prior over programs of length on a universal monotone Turing machine and letting tend to infinity?
What you describe is not the Solomonoff prior on hypotheses, but the Solomonoff a priori distribution on sequences/histories! This is the distribution I call in my post. It can then be written as a mixture of LSCSMs, with the weights given either by the Solomonoff prior (involving Kolmogorov complexity) or the a priori prior in my work. Those priors are not the same.
Yeah that one specifically feels so useful and natural that I have some hope it might reach the wider world.
The coding theorem is a different claim when going deeper into the nuances. I may go into this point in a future post.
I redacted this comment since it turns out my post is actually not precisely about the ALT-complexity. There’s nuance here I may go into in a future post.
This is an edge case, but just flagging that it’s a bit unclear to me how to apply this to my own post in a useful way. As I’ve disclosed in the post itself:
OpenAI’s o3 found the idea for the dovetailing procedure. The proof of the efficient algorithmic Kraft coding in the appendix is mine. The entire post is written by myself, except the last paragraph of the following section, which was first drafted by GPT-5.
Does this count as Level 3 or 4? o3 provided a substantial idea, but the resulting proof was entirely written down by myself. I’m also unsure whether the full drafting of precisely one paragraph (which summarizes the rest of the post) by GPT-5 counts as editing or the writing of substantial parts.
I don’t know what “all-too-plausibly” means. Depending on the probabilities that this implies I may agree or disagree.
Fwiw., my hair grew longer and people often point that out, but never has anyone followed with “looks good”.
I think the compute they spend on inference will also just get scaled up over time.
I think people don’t usually even try to figure something like that out, or are even aware of the option. So if you publicly announce that a user has deactivated their account X times, then this is information that almost no one would otherwise ever receive.
I also have the sense that it’s better to not do that, even though I have a hard time explaining in words why that is.
A NeurIPS paper on scaling laws from 1993, shared by someone on twitter.
Is there a way to filter on Lesswrong for all posts from the alignment forum?
I often like to just see what’s on the alignment forum, but I dislike that I don’t see most Lesswrong comments when viewing those posts on the alignment forum.
You saying you don’t have this experience sounds bizarre to me. Here is an example of this behavior happening to me recently:
It then invented another doi.
This is very common behavior in my experience.