# Spencer Becker-Kahn

Karma: 45
• This was pretty interesting and I like the general direction that the analysis goes in. I feel it ought to be pointed out that what is referred to here as the key result is a standard fact in differential geometry called (something like) the submersion theorem, which in turn is essentially an application of the implicit function theorem.

I think that your setup is essentially that there is an -dimensional parameter space, let’s call it say, and then for each element of the training set, we can consider the function which takes in a set of parameters (i.e. a model) and outputs whatever the model does on training data point . We are thinking of both and as smooth (or at least sufficiently differentiable) spaces (I take it).

A contour plane is a level set of one of the , i.e. a set of the form

for some and . A behavior manifold is a set of the form

for some .

A more concise way of viewing this is to define a single function and then a behavior manifold is simply a level set of this function. The map is a submersion at if the Jacobian matrix at is a surjective linear map. The Jacobian matrix is what you call I think (because the Jacobian is formed with each row equal to a gradient vector with respect to one of the output coordinates). It doesn’t matter much because what matters to check the surjectivity is the rank. Then the standard result implies that given , if is a submersion in a neighbourhood of a point , then is a smooth -dimensional submanifold in a neighbourhood of .

Essentially, in a neighbourhood of a point at which the Jacobian of has full rank, the level set through that point is an -dimensional smooth submanifold.

Then, yes, you could get onto studying in more detail the degeneracy when the Jacobian does not have full rank. But in my opinion I think you would need to be careful when you get to claim 3. I think the connection between loss and behavior is not spelled out in enough detail: Behaviour can change while loss could remain constant, right? And more generally, in exactly which directions do the implications go? Depending on exactly what you are trying to establish, this could actually be a bit of a ‘tip of the iceberg’ situation though. (The study of this sort of thing goes rather deep; Vladimir Arnold et al. wrote in their 1998 book: “The theory of singularities of smooth maps is an apparatus for the study of abrupt, jump-like phenomena—bifurcations, perestroikas (restructurings), catastrophes, metamorphoses—which occur in systems depending on parameters when the parameters vary in a smooth manner”.)

Similarly when you say things like “Low rank indicates information loss”, I think some care is needed because the paragraphs that follow seem to be getting at something more like: If there is a certain kind of information loss in the early layers of the network, then this leads to low rank . It doesn’t seem clear that low rank is necessarily indicative of information loss?

• Thanks for the comments and pointers!

• Thanks Rohin!

• Hi there,

Given that you’ve described various ‘primarily conceptual’ projects on the Alignment Team, and given the distinction between Scientists and Engineers, one aspect that I’m unsure about is roughly: Would you expect a Research Scientist on the Alignment Team to necessarily have a minimum level of practical ML knowledge? Are you able to say any more about that? e.g. Would they have to pass a general Deep Mind coding test or something like that?

# An ob­ser­va­tion about Hub­inger et al.’s frame­work for learned optimization

13 May 2022 16:20 UTC
15 points
• Yes I think you understood me correctly. In which case I think we more or less agree in the sense that I also think it may not be productive to use Richard’s heuristic as a criterion for which research directions to actually pursue.

• I broadly agree with Richard’s main point, but I also do agree with this comment in the sense that I am not confident that the example of Turing compared with e.g. Einstein is completely fair/​accurate.

One thing I would say in response to your comment, Adam, is that I don’t usually see the message of your linked post as being incompatible with Richard’s main point. I think one usually does have or does need productive mistakes that don’t necessarily or obviously look like they are robust partial progress. But still, often when there actually is a breakthrough, I think it can be important to look for this “intuitively compelling” explanation. So one thing I have in mind is that I think it’s usually good to be skeptical if a claimed breakthrough seems to just ‘fall out’ of a bunch of partial work without there being a compelling explanation after the fact.

• I agree i.e. I also (fairly weakly) disagree with the value of thinking of ‘distilling’ as a separate thing. Part of me wants to conjecture that it’s comes from thinking of alignment work predominantly as mathematics or a hard science in which the standard ‘unit’ is a an original theorem or original result which might be poorly written up but can’t really be argued against much. But if we think of the area (I’m thinking predominantly about more conceptual/​theoretical alignment) as a ‘softer’, messier, ongoing discourse full of different arguments from different viewpoints and under different assumptions, with counter-arguments, rejoinders, clarifications, retractions etc. that takes place across blogs, papers, talks, theorems, experiments etc that all somehow slowly works to produce progress, then it starts to be less clear what this special activity called ‘distilling’ really is.

Another relevant point, but one which I won’t bother trying to expand on much here, is that a research community assimilating—and then eventually building on—complex ideas can take a really long time.

[At risk of extending into a rant, I also just think the term is a bit off-putting. Sure, I can get the sense of what it means from the word and the way it is used—it’s not completely opaque or anything—but I’d not heard it used regularly in this way until I started looking at the alignment forum. What’s really so special about alignment that we need to use this word? Do we think we have figured out some new secret activity that is useful for intellectual progress that other fields haven’t figured out? Can we not get by using words like “writing” and “teaching” and “explaining”?]

• It could also work here. But I do feel like pointing out that the bounty format has other drawbacks. Maybe it works better when you want a variety of bitesize contributions, like various different proposals? I probably wouldn’t do work like Abram proposes—quite a long and difficult project, I expect—for the chance of winning a prize, particularly if the winner(s) were decided by someone’s subjective judgement.

• This post caught my eye as my background is in mathematics and I was, in the not-too-distant past, excited about the idea of rigorous mathematical AI alignment work. My mind is still open to such work but I’ll be honest, I’ve since become a bit less excited than I was. In particular, I definitely “bounced off” the existing write-ups on Infrabayesianism and now without already knowing what it’s all about, it’s not clear it’s worth one’s time. So, at the risk of making a basic or even cynical point: The remuneration of the proposed job could be important for getting attention/​ incentivising people on-the-fence.