AI notkilleveryoneism researcher at Apollo, focused on interpretability.
Lucius Bushnaq
I think this certainly describes a type of gears level work scientists engage in, but not the only type, nor necessarily the most common one in a given field. There’s also model building, for example.
Even once you’ve figured out which dozen variables you need to control to get a sled to move at the same speed every time, you still can’t predict what that speed would be if you set these dozen variables to different values. You’ve got to figure out Newton’s laws of motion and friction before you can do that.
Finding out which variables are relevant to a phenomenon in the first place is usually a required initial step for building a predictive model, but it’s not the only step, nor necessarily the hardest one.
Another type of widespread scientific work I can think of is facilitating efficient calculation. Even if you have a deterministic model that you’re pretty sure could theoretically predict a class of phenomena perfectly, that doesn’t mean you have the computing power necessary to actually use it.
Lattice Quantum Chromodynamics should theoretically be able to predict all of nuclear physics, but employing it in practice requires coming up with all sorts of ingenuous tricks and effective theories to reduce the computing power required for a given calculation. It’s enough to have kept a whole scientific field busy for over fifty years, and we’re still not close to actually being able to freely simulate every interaction of nucleons at the quark level from scratch.
I don’t know enough about neurology to make a statement on whether this is something human children learn, or whether it comes evolutionarily preprogrammed, so to speak. But in a universe where physics wasn’t at least approximately local, I would expect there’d indeed be little point in holding the notion that points in space and time have given “distances” from one another.
To clarify, my point was that at least in my experience, this isn’t always the hard step. I can easily see that being the case in a “top-down” field, like a lot of engineering, medicine, parts of material science, biology and similar things. There, my impression is that once you’ve figured out what a phenomenon is all about, it often really is as simple as fitting some polynomial of your dozen variables to the data.
But in some areas, like fundamental physics, which I’m involved in, building your model isn’t that easy or straightforward. For example, we’ve been looking for a theory of quantum gravity for ages. We know roughly what sort of variables it should involve. We know what data we want it to explain. But still, actually formulating that theory has proven hellishly difficult. We’ve been on it for over fifty years now and we’re still not anywhere close to real success.
The measure for peak broadness used near the end confuses me in many ways. It seems to imply that a large Hessian determinant means a broad peak. But wouldn’t you expect the opposite, if anything? E.g. in one dimension, this would seem to imply that a larger second derivative would mean a broader peak. That just seems exactly false.
It seems like there’s either something missing in this post, or in my head.
Yes, I really don’t see how this would work right now. If I try doing Taylor series, which is what I’d start with for something like this, I very much get the opposite result.
I’m actually (hopefully) joining ai safety camp to work on your topics next month, so maybe we can talk about this more then?
Not really sure what you mean by the first part. E.g. “the modularity in the environment ⇒ modularity in the system” explanation definitely doesn’t cast it as a second order effect.
Yes, I guess we can add that one to the pile, thanks. Honestly, I feel like it’s basically confirmed that connection costs play a significant part. But I don’t think they explain all there is to know about selection for modularity. The adaptation/generality connection just seems too intuitive and well backed to not be important.
Yes, a chat could definitely be valuable. I’ll pm you.
I agree that connection costs definitely look like a real, modularity promoting effect. Leaving aside all the empirical evidence, I have some trouble imagining how it could plausibly not be. If you put a ceiling on how many connections there can be, the network has got to stick to the most necessary ones. And since some features of the world/input data are just more “interlinked” than others, it’s hard to see how the network wouldn’t be forced to reflect that in some capacity.
I just don’t think it’s the only modularity promoting effect.
If you have something that’s smart enough to figure out the training environment, the dynamics of gradient descent, and its own parameters, then yes, I expect it could do a pretty good job at preserving its goals while being modified. But that’s explicitly not what we have here. An agent that isn’t smart enough to not know how to trick out your training process so it doesn’t get modified to have human values probably also isn’t smart enough to then tell you how to preserve these values during further training.
Or at least, the chance of the intelligence thresholds working out like that does not sound to me like something you want to base a security strategy on.
Ooops. Thanks, will fix that.
In the human learning case, what the human is picking up on here is that there is a distinct thing called temperature, which can be different and that matters, a lot. There is now a temperature module/abstraction where there wasn’t one before. That’s the learning step MVG is hinting at, I think.
Regarding the microorganism, the example situation you give is not directly covered by MVG as described here, but see the section “Specialisation drives the evolution of modularity” in the literature review, basically: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000719
If you have genes that specialise to express or not express depending on initial conditions, you get a dynamic nigh identical to this one. Two loss functions you need to “do well” on, with a lot of shared tasks, except for a single submodule that needs changing depending on external circumstance. This gets you two gene activity patterns, with a lot of shared gene activity states, like the shared parameter values between the designs N_1 and N_2 here. The work of “fine tuning” the model to L_1, L_2 is then essentially “already done”, and accessed by setting the initial conditions right, instead of needing to be redone by “evolution” after each change, as in the simulation in this article. But it very much seemed like the same dynamic to me.
Regarding specialisation vs. modularity: I view the two as intimately connected, though not synonymous.
What makes a blood pumping cell a blood pumping cell and not a digestive cell? The way you connect the atoms, ultimately. So how come, when you want both blood pumping and digestion, it apparently works better to connect the atoms such that there’s a cluster that does blood pumping, and a basically separate cluster that does digesting, instead of a big digesting+blood pumping cluster?
You seem to take it as a given that it is so, but that’s exactly the kind of thing we’re setting out to explain!
“Because physics is local” is one reason, clearly. But if it were the only reason, designs without locality on our computers would never be modular. But they are, sometimes!
R.e: logistical reasons: Yes, that’s local connection cost due to locality again, basically.
We have not quantitatively scored modularity in current ML compared to modularity in the human brain (taking each neuron as a node) yet. It would indeed be interesting to see how that comes out. We have the Q score of some CNNs, thanks to CHAI. Do you know of any paper trying to calculate Q for the brain?
To clarify, the main difficulty I see here is that this isn’t actually like training n networks of size N/n, because you’re still using the original loss function.
Your optimiser doesn’t get to see how well each module is performing individually, only their aggregate performance. So if module three is doing great, but module five is doing abysmally, and the answer depends on both being right, your loss is really bad. So the optimiser is going to happily modify three away from the optimum it doesn’t know it’s in.
Nevertheless, I think there could be something to the basic intuition of fine tuning just getting more and more difficult for the optimiser as you increase the parameter count, and with it the number of interaction terms. Until the only way to find anything good anymore is to just set a bunch of those interactions to zero.
This would predict that in 2005-style NNs with tiny parameter counts, you would have no modularity. In real biology, with far more interacting parts, you would have modularity. And in modern deep learning nets with billions of parameters, you would also have modularity. This matches what we observe. Really neatly and simply too.
It’s also dead easy to test. Just make a CNN or something and see how modularity scales with parameter count. This is now definitely on our to do list.
Thanks a lot again, Simon!
I am asking both! I suspect the reasons are likely to be very similar, in the sense that you can find a set of general modularity theorems that will predict you both phenomena.
Why does specialisation work better? It clearly doesn’t always. A lot of early NN designs love to mash everything together into an interconnected mess, and the result performs better on the loss function than modular designs that have parts specialised for each subtask.
Are connection costs due to locality a bigger deal for inter-cell dynamics than intra-cell dynamics? I’d guess yes. I am not a biologist, but it sure seems like interacting with things in the same cell as you should be relatively easier, though still non-trivial.
Are connection costs in the inter-cell regime so harsh that they completely dominate all other modularity selection effects in that regime, so we don’t need to care about them? I’m not so sure. I suspect not.
Beats me. I had been assuming that you were thinking of gross anatomy (“the cerebellum is over here, and the cortex is over there, and they look different and they do different things etc.”), by analogy with the liver and heart etc.
I’m thinking about those too. The end goal here is literally a comprehensive model for when modularity happens and how much, for basically anything at any scale that was made by an optimisation process like genetic algorithms, gradient descent, ADAM, or whatever.
But why is that so? Why are there no parameter combinations here that let you do well simultaneously on all of these tasks, unless you split your system into parts? That is what we are asking.
Could it be that such optima just do not exist? Maybe. It’s certainly not how it seems to work out in small neural networks, but perhaps for some classes of tasks, they really don’t, or are infrequent enough to not be finable by the optimiser. That’s the direct selection for modularity hypothesis, basically.
I don’t currently favour that one though. The tendency of our optimisers to connect everything to everything else even when it seems to us like it should be actively counterproductive to do so, but still end up with a good loss, suggests to me that our intuition that you can’t do good while trying to do everything at once is mistaken. At least as long as “doing good” is defined as scoring well on the loss function. If you add in things like robustness, you might have a very different story. Thus, MVG.
It e.g. wouldn’t use potassium to send signals, I’d imagine. If a design like this exists, I’d expect it to involve totally different parts and steps that do not conflict like this. Something like a novel (to us) kind of ion channel, maybe, or something even stranger.
Does it seem to you that the constraints put on cell design are such that the ways of sending signals and digesting things we currently know of are the only ones that seem physically possible?
This is not a rhetorical question. My knowledge of cell biology is severely lacking, so I don’t have deep intuitions telling me which things seem uniquely nailed down by the laws of physics. I just had a look at the action potential wikipedia page, and didn’t immediately see why using potassium ions was the only thing evolution could’ve possibly done to make a signalling thing. Or why using hydrochloric acid would be the only way to do digestion.
A very good point!
I agree that fix 1. seems bad, and doesn’t capture what we care about.
At first glance, fix 2. seems more promising to me, but I’ll need to think about it.
Thank you very much for pointing this out.
I said, at the end, was that I’d better be getting paid for this, and they all laughed and said of course I was, lots of money, at least as much as my parents were getting, because children are sapient beings too.
This seems like a rather hypocritical thing to say, unless dath ilan had some clever idea for how to implement this compensation that I’m failing to see right now.
If I was a subject in this experiment, there would be no amount of money you could pay me to retroactively agree that this was a fair deal. There’s just nothing money can buy that would be worth the years of deception and the hours of mortal terror.
If it was earth it’d be different, because earth has absolutely dire problems that can be solved by money, and given enough millions, that’d take precedence over my own mental wellbeing. But absent such moral obligations, it’s just not worth it for me.
So do parents surreptitiously ask their children what sum of money they’d demand as compensation for participating in a wide variety of hypothetical experiments, some real, some fake, years before they move to a town like this? Seems rather impractical and questionable, considering how young the children would be when they made their choice.
That seems to make it worse, not better?
I would certainly accept such treatment for 2 million dollars, for example.
On earth, 200 million and I might consider it, though it sure wouldn‘t be my cheerful price. On dath ilan, not for any sum. Even fiat access to all economic output wouldn‘t be worth it.
And I think they made a lot of money, presumably the amount of money this rather-competent society predicted would be their “cheerful price”.
If you try to solve this with prediction, and have any kind of feedback mechanism in place where the project gets docked money in proportion to how much predicted cheerful prices diverged from occasionally measured actual cheerful prices, I expect your market to tell you that this project is prohibitively costly, because you can‘t get the chance of including children like me small enough.
In addition, I don‘t know about you, but I would have objections to this situation even if a perfect/extremely good prediction mechanism was in place. Correlating events with my actual preferences is one reason I want people to ask for my consent before doing things to me, and perfect prediction takes care of that. But it is not the only reason. I also value being the person with final say inherently.
So if I were to be denied my right to deny consent, and told in the same sentence that of course I‘m a sapient being too and my preferences matter, it would taste rather bitter.
I remember my thought process going something like this:
P (Aliens in Milky way) ~0.75
P (Aliens) ~100
P (Answer pulled from anus on basis of half remembered internet facts is remotely correct) ~0,8
So:
P (Aliens) P (Anus) ~0,8
P (Milky aliens) P (Anus) ~0,6