Neonate AI Safety researcher (Selection Theorems), physics PhD student
I don’t think an “actual distribution” over the activations is a thing? The distribution depends on what inputs you feed it. I don’t see in what sense there’s some underlying “true” continuous distribution we could find here.
The input distribution we measure on should be one that is representative of the behaviour of the network we want to get the information flows for. To me, the most sensible candidate for this seemed to be the training distribution, since that’s the environment the network is constructed to operate in. I am receptive to arguments that out of distribution behaviour should be probed too somehow, but I’m not sure how to go about that in a principled way.
So for now, the idea is to have P_B literally be the discrete distribution you get from directly sampling from the activations of B recorded from ca. one epoch of training data.
I am very happy you’re looking into this. I’ve never seen a well motivated procedure for choosing k, and we’re going to need one.
Let’s try and justify why this notion of choice might be appropriate. Firstly, let’s state the obvious that choice here is clearly related to information and entropy. If someone wishes to communicate a partition that doesn’t match one of the natural module numbers, they need to specify how many partitions to divide our graph into and then transmit additional information about specifics of that partition. In the worst case (there are no connections between nodes) specifying the exact partition involves specifying an amount of information proportional to the logarithm of The Bell Number.
This alone doesn’t seem satisfactory to me. I think you could make up lots of other prescriptions for picking partitions that take little information to specify. E.g.: “The ones with minimal n-cuts for their k that come closest to having equal numbers of nodes in each subgraph.”
“Cuts with a single minimum n-cut partition” could be what we’re looking for, but I don’t see anything yet showing it has to be this, and it could not be anything else.
We’ll probably have a post with our own thoughts on measuring modularity out soon, though it’ll be more focused on how to translate a neural network into something you can get a meaningful n-cut-like measure that captures what we care about for at all.
If you’re interested in exchanging notes, drop me or TheMcDouglas a pm.
Seems like a start. But I think one primary issue for imagining these basins is how high dimensional they are.
Note also that we’re not just looking for visualisations of the loss landscape here. Due to the correspondence between information loss and broadness outlined in Vivek’s linked post, we want to look at the nullspace of the space spanned by the gradients of the network output for individual data points.
EDIT: Gradient of network output, not gradient of the loss function, sorry. The gradient of the loss function is zero at perfect training loss.
Extremely valuable I’d guess, but the whole problem is that alignment is still preparadigmatic. We don’t actually know yet what the well-defined nerd snipe questions we should be asking are.
I think that preparadigmatic research and paradigmatic research are two different skill sets, and most Highly Impressive People in mainstream STEM are masters at the later, not the former.
I do think we’re more paradigmatic than we were a year ago, and that we might transition fully some time soon. I’ve got a list of concrete experiments on modularity in ML systems I’d like run for example, and I think any ML savvy person could probably do those, no skill at thinking about fuzzy far mode things required.
So I’m not sure a sequence like this could be written today, but maybe in six months?
I’ve tried to raise the topic with smart physics people I know or encounter whenever the opportunity presents itself. So far, the only ones who actually went on to take steps to try and enter alignment already had prior involvement with EA or LW.
For the others, the main reactions I got seemed to be:
Sounds interesting, but this is all too hypothetical for me to really take seriously. It hinges on all these concepts and ideas you propose about how AGI is going to work, and I don’t buy yet that all of them are correct
Sounds concerning, but I’d rather work on physics
Sounds depressing. I already thought climate change will kill us all, now there’s also this? Let me just work on physics and not think about this any more.
I’m not a mind reader of course, so maybe their real reaction was “Quick, say something conciliatory to make this person shut up about the pet topic they are insane about.”
Is a(x) in the formulas supposed to be pi_0(x)?
These manifolds generally extend out to infinity, so it isn’t really meaningful to talk about literal “basin volume”. We can focus instead on their dimensionality.
Once you take priors over the parameters into account, I would not expect this to continue holding. I’d guess that if you want to get the volume of regions in which the loss is close to the perfect loss, directions that are not flat are going to matter a lot. Whether a given non-flat direction is incredibly steep, or half the width given by the prior could make a huge difference.
I still think the information loss framework could make sense however. I’d guess that there should be a more general relation where the less information there is to distinguish different data points, the more e.g. principal directions in the Hessian of the loss function will tend to be broad.
I’d also be interested in seeing what happens if you look at cases with non-zero/non-perfect loss. That should give you second order terms in the network output, but these again look to me like they’d tend to give you broader principal directions if you have less information exchange in the network. For example, a modular network might have low-dimensional off-diagonals, which you can show with the Schur complement is equivalent to having sparse off-diagonals, which I think would give you less extreme eigenvalues.
I know we’ve discussed these points before, but I thought I’d repeat them here where people can see them.
Sounds good to me! Anyone up for making this an EA startup?
Having more Neumann level geniuses around seems like an extremely high impact intervention for most things, not even just singularity related ones.
As for tractability, I can’t say anything about how hard this would be to get past regulators, or how much engineering work is missing for making human cloning market ready, but finding participants seems pretty doable? I’m not sure yet whether I want children, but if I decide I do, I’d totally parent a Neumann clone. If this would require moving to some country where cloning isn’t banned, I might do that as well. I bet lots of other EAs would too.
I am fairly confident that I would incredibly strongly dislike being lied to like this even if it were „for my own benefit“. The source of my disgust for nonconsensual lies does not seem to me to stem from a history of such lies hurting me. Rather, they just feel inherently hurtful. That’s on top of the distress I would feel from years of keeping my mouth shut and my head down regarding the fake discrimination while secretly crying about it sometimes.
Also, there‘s still the hours of mortal terror that this scenario entails.
I think I can model my own preferences better than you can, thank you very much. Regardless of whether I‘d „get over it“ or not, this experience would bother me more than anything extraordinary I can think of that I could plausibly buy in dath ilan‘s economy would please me.
I would certainly accept such treatment for 2 million dollars, for example.
On earth, 200 million and I might consider it, though it sure wouldn‘t be my cheerful price. On dath ilan, not for any sum. Even fiat access to all economic output wouldn‘t be worth it.
And I think they made a lot of money, presumably the amount of money this rather-competent society predicted would be their “cheerful price”.
If you try to solve this with prediction, and have any kind of feedback mechanism in place where the project gets docked money in proportion to how much predicted cheerful prices diverged from occasionally measured actual cheerful prices, I expect your market to tell you that this project is prohibitively costly, because you can‘t get the chance of including children like me small enough.
In addition, I don‘t know about you, but I would have objections to this situation even if a perfect/extremely good prediction mechanism was in place. Correlating events with my actual preferences is one reason I want people to ask for my consent before doing things to me, and perfect prediction takes care of that. But it is not the only reason. I also value being the person with final say inherently.
So if I were to be denied my right to deny consent, and told in the same sentence that of course I‘m a sapient being too and my preferences matter, it would taste rather bitter.
That seems to make it worse, not better?
I said, at the end, was that I’d better be getting paid for this, and they all laughed and said of course I was, lots of money, at least as much as my parents were getting, because children are sapient beings too.
This seems like a rather hypocritical thing to say, unless dath ilan had some clever idea for how to implement this compensation that I’m failing to see right now.
If I was a subject in this experiment, there would be no amount of money you could pay me to retroactively agree that this was a fair deal. There’s just nothing money can buy that would be worth the years of deception and the hours of mortal terror.
If it was earth it’d be different, because earth has absolutely dire problems that can be solved by money, and given enough millions, that’d take precedence over my own mental wellbeing. But absent such moral obligations, it’s just not worth it for me.
So do parents surreptitiously ask their children what sum of money they’d demand as compensation for participating in a wide variety of hypothetical experiments, some real, some fake, years before they move to a town like this? Seems rather impractical and questionable, considering how young the children would be when they made their choice.
A very good point!
I agree that fix 1. seems bad, and doesn’t capture what we care about.
At first glance, fix 2. seems more promising to me, but I’ll need to think about it.
Thank you very much for pointing this out.
It e.g. wouldn’t use potassium to send signals, I’d imagine. If a design like this exists, I’d expect it to involve totally different parts and steps that do not conflict like this. Something like a novel (to us) kind of ion channel, maybe, or something even stranger.
Does it seem to you that the constraints put on cell design are such that the ways of sending signals and digesting things we currently know of are the only ones that seem physically possible?
This is not a rhetorical question. My knowledge of cell biology is severely lacking, so I don’t have deep intuitions telling me which things seem uniquely nailed down by the laws of physics. I just had a look at the action potential wikipedia page, and didn’t immediately see why using potassium ions was the only thing evolution could’ve possibly done to make a signalling thing. Or why using hydrochloric acid would be the only way to do digestion.
But why is that so? Why are there no parameter combinations here that let you do well simultaneously on all of these tasks, unless you split your system into parts? That is what we are asking.
Could it be that such optima just do not exist? Maybe. It’s certainly not how it seems to work out in small neural networks, but perhaps for some classes of tasks, they really don’t, or are infrequent enough to not be finable by the optimiser. That’s the direct selection for modularity hypothesis, basically.
I don’t currently favour that one though. The tendency of our optimisers to connect everything to everything else even when it seems to us like it should be actively counterproductive to do so, but still end up with a good loss, suggests to me that our intuition that you can’t do good while trying to do everything at once is mistaken. At least as long as “doing good” is defined as scoring well on the loss function. If you add in things like robustness, you might have a very different story. Thus, MVG.
I am asking both! I suspect the reasons are likely to be very similar, in the sense that you can find a set of general modularity theorems that will predict you both phenomena.
Why does specialisation work better? It clearly doesn’t always. A lot of early NN designs love to mash everything together into an interconnected mess, and the result performs better on the loss function than modular designs that have parts specialised for each subtask.
Are connection costs due to locality a bigger deal for inter-cell dynamics than intra-cell dynamics? I’d guess yes. I am not a biologist, but it sure seems like interacting with things in the same cell as you should be relatively easier, though still non-trivial.
Are connection costs in the inter-cell regime so harsh that they completely dominate all other modularity selection effects in that regime, so we don’t need to care about them? I’m not so sure. I suspect not.
Beats me. I had been assuming that you were thinking of gross anatomy (“the cerebellum is over here, and the cortex is over there, and they look different and they do different things etc.”), by analogy with the liver and heart etc.
I’m thinking about those too. The end goal here is literally a comprehensive model for when modularity happens and how much, for basically anything at any scale that was made by an optimisation process like genetic algorithms, gradient descent, ADAM, or whatever.
To clarify, the main difficulty I see here is that this isn’t actually like training n networks of size N/n, because you’re still using the original loss function.
Your optimiser doesn’t get to see how well each module is performing individually, only their aggregate performance. So if module three is doing great, but module five is doing abysmally, and the answer depends on both being right, your loss is really bad. So the optimiser is going to happily modify three away from the optimum it doesn’t know it’s in.
Nevertheless, I think there could be something to the basic intuition of fine tuning just getting more and more difficult for the optimiser as you increase the parameter count, and with it the number of interaction terms. Until the only way to find anything good anymore is to just set a bunch of those interactions to zero.
This would predict that in 2005-style NNs with tiny parameter counts, you would have no modularity. In real biology, with far more interacting parts, you would have modularity. And in modern deep learning nets with billions of parameters, you would also have modularity. This matches what we observe. Really neatly and simply too.
It’s also dead easy to test. Just make a CNN or something and see how modularity scales with parameter count. This is now definitely on our to do list.
Thanks a lot again, Simon!