AI notkilleveryoneism researcher, focused on interpretability.
Personal account, opinions are my own.
I have signed no contracts or agreements whose existence I cannot mention.
AI notkilleveryoneism researcher, focused on interpretability.
Personal account, opinions are my own.
I have signed no contracts or agreements whose existence I cannot mention.
Yeah, the observation that the universe seems maybe well-predicted by a program running on some UTM seems like a subset of the observation that the universe seems amendable to mathematical description and compression. So the former observation isn’t really an explanation for the latter, just a kind of restatement. We’d need an argument for why a prior over random programs running on an UTM should be preferred over a prior over random strings. Why does the universe have structure? The Universal Prior isn’t an answer to that question. It’s just an attempt to write down a sensible prior that takes the observation that the universe is structured and apparently predictable into account.
See footnote. Since this permutation freedom always exists no matter what the learned algorithm is, it can’t tell us anything about the learned algorithm.
… Wait, are you saying we’re not propagating updates into to change the mass it puts on inputs vs. ?
My viewpoint is that the prior distributions giving weight to each of the three hypotheses is different from the one giving weight to each of and , even if their mixture distributions are exactly the same.
That’s pretty unintuitive to me. What does it matter whether we happen to write out our belief state one way or the other? So long as the predictions come out the same, what we do and don’t choose to call our ‘hypotheses’ doesn’t seem particularly relevant for anything?
We made our choice when we settled on as the prior. Everything past that point just seems like different choices of notation to me? If our induction procedure turned out to be wrong or suboptimal, it’d be because was a bad prior to pick, not because we happened to write down in a weird way, right?
If they have the same prior on sequences/histories, then in what relevant sense are they not the same prior on hypotheses? If they both sum to , how can their predictions come to differ?
I’m confused. Isn’t one of the standard justifications for the Solomonoff prior that you can get it without talking about K-complexity, just by assuming a uniform prior over programs of length on a universal monotone Turing machine and letting tend to infinity?[1] How is that different from your ? It’s got to be different right, since you say that is not equivalent to the Solomonoff prior.
See e.g. An Introduction to Universal Artifical Intelligence, pages 145 and 146.
Obviously SLT comes to mind, and some people have tried to claim that SLT suggests that neural network training is actually more like Solomonoff prior than the speed prior (e.g. bushnaq) although I think that work is pretty shaky and may well not hold up.
That post is superseded by this one. It was just a sketch I wrote up mostly to clarify my own thinking, the newer post is the finished product.
It doesn’t exactly say that neural networks have Solomonoff-style priors. It depends on the NN architecture. E.g., if your architecture is polynomials, or MLPs that only get one forward pass, I do not expect them to have a prior anything like that of a compute-bounded Universal Turing Machine.
And NN training adds in additional complications. All the results I talk about are for Bayesian learning, not things like gradient descent. I agree that this changes the picture and questions about the learnability of solutions become important. You no longer just care how much volume the solution takes up in the prior, you care how much volume each incremental building block of the solution takes up within the practically accessible search space of the update algorithm at that point in training.
I think just minimising the norm of the weights is worth a try. There’s a picture of neural network computation under which this mostly matches their native ontology. It doesn’t match their native ontology under my current picture, which is why I personally didn’t try doing this. But the empirical results here seem maybe[1] better than I predicted they were going to be last February.
I’d also add that we just have way more compute and way better standard tools for high-dimensional nonlinear optimisation than we used to. It’s somewhat plausible to me that some AI techniques people never got to work at all in the old days could now be made to kind of work a little bit with sufficient effort and sheer brute force, maybe enough to get something on the level of an AlphaGo or GPT-2. Which is all we’d really need to unlock the most crucial advances in interp at the moment.
I haven’t finished digesting the paper yet, so I’m not sure.
Problem with this: I think training tasks in real life are usually not, in fact, compatible with very many parameter settings. Unless the training task is very easy compared to the size of the model, basically all spare capacity in the model parameters will be used up eventually, because there’s never enough of it. The net can always use more, to make the loss go down a tiny bit further, predict internet text and sensory data just a tiny bit better, score a tiny bit higher on the RL reward function. If nothing else, spare capacity can always be used to memorise some more training data points. may be maximal given the constraints, but the constraints will get tighter and tighter as training goes on and the amount of coherent structure in the net grows, until approximately every free bit is used up.[1]
But we can still ask whether there are subsets of the training data on which the model outputs can be realised by many different parameter settings, and try to identify internal structure in the net that way, looking for parts of the parameters that are often free. If a circuit stores the fact that the Eiffel tower is in Paris, the parameter settings in that circuit will be free to vary on most inputs the net might receive, because most inputs don’t actually require the net to know that the Eiffel tower is in Paris to compute its output.
A mind may have many skills and know many facts, but only a small subset of these skills and facts will be necessary for the mind to operate at any particular moment in its computation. This induces patterns in which parts of the mind’s physical implementation are or aren’t free to vary in any given chunk of computational time, which we can then use to find the mind’s skills and stored facts inside its physical instantiation.
So, instead of doing stat mech to the loss landscape averaged over the training data, we can do stat mech to the loss landscapes, plural, at every training datapoint.
Some degrees of freedom will be untouched because they’re baked into the architecture, like the scale freedom of ReLU functions. But those are a small minority and also not useful for identifying the structure of the learned algorithms. Precisely because they are guaranteed to stay free no matter what algorithms are learned, they cannot contain any information about them.
I think on the object level, one of the ways I’d see this line of argument falling flat is this part
Some AI safety problems are legible (obvious or understandable) to company leaders and government policymakers, implying they are unlikely to deploy or allow deployment of an AI while those problems remain open (i.e., appear unsolved according to the information they have access to).
I am not at all comfortable relying on nobody deploying just because there are obvious legible problems. With the right incentives and selection pressures, I think people can be amazing at not noticing or understanding obvious understandable problems. Actual illegibility does not seem required.
In my experience, the main issue with this kind of thing is finding really central examples of symmetries in the input that are emulatable. There’s a couple easy ones, like low rank[1] structure, but I never really managed to get a good argument for why generic symmetries in the data would often be emulatable[2] in real life.[3]
You might want to chat with Owen Lewis about this. He’s been thinking about connections between input symmetries and mechanistic structure for a while, and was interested in figuring out some kind of general correspondence between input symmetries and parameter symmetries.
Good name for this concept by the way, thanks.
For a while I was hoping that almost any kind of input symmetry would tend to correspond to low-rank structure in the hidden representations of , if has the sort of architecture used by modern neural networks. Then, almost any kind of symmetry would be reducible to the low-rank structure case[2], and hence almost any symmetry would be emulatable.
But I never managed to show this, and I no longer think it is true.
There are a couple of necessary conditions for this of course. E.g. the architecture needs to actually use weight matrices, like neural networks do.
The WaPo article appears to refer to passenger fatalities per billion passenger miles, not total fatalities. For comparison, trains in the European Union in 2021 apparently had ca. 0.03 passenger fatalities per billion passenger miles, but almost 0.3 total fatalities per million train miles.
Right now it reads like one example of the pledged funding being met, one example of it being only being ca. 3⁄4 met but there’s also two years left until the original deadline, and one example of the funding never getting pledged in the first place (since congress didn’t pass it).
I agree this is a pitifully small investment. But it doesn’t seem like big bills and programs got created and then walked back. More like they just never came to be in the first place. 4.5 billion euros is a paltry sum.
I think this may be an important distinction to make, because it suggests there was perhaps never much political push to prepare for the next pandemic even at the time. Did people actually ‘memory hole’ and forget, or did they just never care in the first place?
I for one don’t recall much discussion about preparing for the next pandemic outside rationalist/EA-adjacent circles even while the Covid-19 pandemic was still in full swing.
The Pandemic Fund got pledged $3 bio.
...
the Pandemic Fund has received $3.1 bio, with an unmet funding gap of $1 bio. as of the time of writing.
I’m confused. This makes it sound like they did get the pledged funding?
For what it’s worth, my mother read If Anyone Builds It, Everyone Dies and seems to have been convinced by it. She’s probably not very representative though. She had prior exposure to AI x-risk arguments through me, is autistic, has a math PhD, and is a Gödel, Escher, Bach fan.
The proposal at the end looks somewhat promising to me on a first skim. Are there known counterpoints for it?
I agree that this seems maybe useful for some things, but not for the “Which UTM?” question in the context of debates about Solomonoff induction specifically, and I think that’s the “Which UTM?” question we are actually kind of philosophically confused about. I don’t think we are philosophically confused about which UTM to use in the context of us already knowing some physics and wanting to incorporate that knowledge into the UTM pick, we’re confused about how to pick if we don’t have any information at all yet.
Attempted abstraction and generalization: If we don’t know what the ideal UTM is, we can start with some arbitrary UTM , and use it to predict the world for a while. After (we think) we’ve gotten most of our prediction mistakes out of the way, we can then look at our current posterior, and ask which other UTM might have updated to that posterior faster, using less bits of observation about (our universe/the string we’re predicting). You could think of this as a way to define what the ‘correct’ UTM is. But I don’t find that definition very satisfying, because the validity of this procedure for finding a good depends on how correct the posterior we’ve converged on with our previous, arbitrary, is. ‘The best UTM is the one that figures out the right answer the fastest’ is true, but not very useful.
Is the thermodynamics angle gaining us any more than that for defining the ‘correct’ choice of UTM?
We used some general reasoning procedures to figure out some laws of physics and stuff about our universe. Now we’re basically asking what other general reasoning procedures might figure out stuff about our universe as fast or faster, conditional on our current understanding of our universe being correct.
Why does it make Bayesian model comparison harder? Wouldn’t you get explicit predicted probabilities for the data from any two models you train this way? I guess you do need to sample from the Gaussian in a few times for each and pass the result through the flow models, but that shouldn’t be too expensive.
I guess figuring out whether we’re “in a bubble” just hasn’t seemed very important to me, relative to how hard it seems to determine? What effects on the strategic calculus do you think it has?
E.g. my current best guess is that I personally should just do what I can to help build the science of interpretability and learning as fast as I can, so we can get to a point where we can start doing proper alignment research and reason more legibly about why alignment might be hard and what could go wrong. Whether we’re in a bubble or not mostly matters for that only insofar as it’s one factor influencing how much time we have left to do that research.
But I’m already going about as fast as I can anyway, so having a better estimate of timelines isn’t very action-relevant for me. And “bubble vs. no bubble” doesn’t even seem like a leading-order term in timeline uncertainty anyway.