Are you bringing up wireheading to answer yes or no to my question (of whether RL is more prone to gradient hacking)? To me, it sounds like you’re suggesting a no, but I think it’s in support of the idea that RL might be prone to gradient hacking. The AI, like me, avoids wireheading itself and so will never be modified by gradient descent towards wireheading because gradient descent doesn’t know anything about wireheading until it’s been tried. So that is an example of gradient hacking itself, isn’t it? Unlike in a supervised learning setup where the gradient descent ‘knows’ about all possible options and will modify any subagents that avoid giving the right answer.

So am I a gradient hacker whenever I just say no to drugs?

# tgb

I’m still thinking about this (unsuccessfully). Maybe my missing piece is that the examples I’m considering here still do not have any of the singularities that this topic focuses on! What are the simplest examples with singularities? Say again we’re fitting

`y = f(x)`

for over some parameters. And specifically let’s consider the points (0,0) and (1,0) as our only training data. Then has minimal loss set . That has a singularity at (0,0,0). I don’t really see why it would generalize better than or , neither of which have singularities in their minimal loss sets. These still are only examples of the type B behavior where they already are effectively just two parameters, so maybe there’s no further improvement for a singularity to give?

Consider instead . Here the minimal loss set has a singularity when at (0,0,0,0). But maybe now if we’re at that point, the model has effectively reduced down to since perturbing either c or d away from zero would still keep the last term zero. So maybe this is a case where has type A behavior in general (since the x^2 term can throw off generalizability compared to a linear) but approximates type B behavior near the singularity (since the x^2 term becomes negligible even if perturbed)? That seems to be the best picture of this argument that I’ve been able to convince myself of so-far! Singularities are (sometimes) points where type A behavior becomes type B behavior.

And a follow-up that I just thought of: is reinforcement learning more prone to gradient hacking? For example, if a sub-agent guesses that a particular previously untried type of action would produce very high reward, the sub-agent might be able to direct the policy away from those actions. The learning process will never correct this behavior if the overall model never gets to learn that those actions are beneficial. Therefore the sub-agent can direct away from some classes of high-reward actions that it doesn’t like without being altered.

There’s been discussion of ‘gradient hacking’ lately, such as here. What I’m still unsure about is whether or not a gradient hacker is just another word for local minimum? It feels different but when I want to try to put a finer definition on it, I can’t. My best alternative is “local minimum, but malicious” but that seems odd since it depends upon some moral character.

Thanks for trying to walk me through this more, though I’m not sure this clears up my confusion. An even more similar model to the one in the video (a pendulum) would be the model that which has four parameters but of course you don’t really need both a and b. My point is that, as far as the loss function is concerned, the situation for a fourth degree polynomial’s redundancy is identical to the situation for this new model. Yet we clearly have two different types of redundancy going on:

Type A: like the fourth degree polynomial’s redundancy which impairs generalizability since it is merely an artifact of the limited training data, and

Type B: like the new model’s redundancy which does not impair generalizability compared to some non-redundant version of it since it is a redundancy in

*all*outputs

Moreover, my intuition is that a highly over-parametrized neural net has much more Type A redundancy than Type B. Is this intuition wrong? That seems perhaps the

*definition*of “over-parametrized”: a model with a lot of Type A redundancy. But maybe I instead am wrong to be looking at the loss function in the first place?

I’m confused by the setup. Let’s consider the simplest case: fitting points in the plane, y as a function of x. If I have three datapoints and I fit a quadratic to it, I have a dimension 0 space of minimizers of the loss function: the unique parabola through those three points (assume they’re not ontop of each other). Since I have three parameters in a quadratic, I assume that this means the effective degrees of freedom of the model is 3 according to this post. If I instead fit a quartic, I now have a dimension 1 space of minimizers and 4 parameters, so I think you’re saying degrees of freedom is still 3. And so the DoF would be 3 for all degrees of polynomial models above linear. But I certainly think that we expect that quadratic models will generalize better than 19th degree polynomials when fit to just three points.

I think the objection to this example is that the relevant function to minimize is not loss on the training data but something else? The loss it would have on ‘real data’? That seems to make more sense of the post to me, but if that were the case, then I think any minimizer of that function would be equally good at generalizing*by definition*. Another candidate would be the parameter-function map you describe which seems to be the relevant map whose singularities we are studying, but we it’s not well defined to ask for minimums (or level-sets) of that at all. So I don’t think that’s right either.

Thanks for the clarification! In fact, that opinion wasn’t even one of the ones I had considered you might have.

I simultaneously would have answered ‘no,’ would expect most people in my social circles to answer no, think it is clear that this being a near-universal is a

*very*bad sign, and also that 25.6% is terrifying. It’s something like ‘there is a right amount of the thing this is a proxy for, and that very much is not it.’At the risk of being too honest, I find passages written like this horribly confusing and never know what you mean when you write like this. (“this” being near universal—what is “this”? (“answering no” like you and your friends or “answering yes” like most of the survey respondents?) 25.6% is terrifying because you think it is high or low? What thing do you think “this” is a proxy for?)

For me, the survey question itself seems bad because it’s very close to two radically different ideas:

- I base my self-worth on my parent’s judgement of me.

- My parents are kind, intelligent people whose judgement making is generally of very high quality. Since they are also biased towards positive views of me, if they judged me poorly then I would take that as serious evidence that I am not living up to what I aspire of myself.

The first sounds unhealthy. The second sounds healthy—at least assuming that one’s parents are in fact kind, intelligent, and generally positively disposed to their children at default. I’m not confident which of the two a “yes” respondent is agreeing to or a “no” is disagreeing with.

Thanks. I think I’ve been tripped up by this terminology more than once now.

Not sure that I understand your claim here about optimization. An optimizer is presumably given some choice of possible initial states to choose from to achieve its goal (otherwise it cannot interact at all). In which case, the set of accessible states will depend upon the chosen initial state and so the optimizer can influence long term behavior and choose whatever best matches it’s desires.

Why would CZ tweet out that he was starting to sell his FTT? Surely that would only decrease the amount he could recover on his sales?

I agree, I was just responding to your penultimate sentence: “In fact, if you could know without labeling generated data, why would you generate something that you can tell is bad in the first place?”

Personally, I think it’s kind of exciting to be part of what might be the last breath of purely human writing. Also, depressing.

Surely the problem is that someone else is generating it—or more accurately lots of other people generating it in huge quantities.

I work in a related field and found this a helpful overview that filled in some gaps of my knowledge that I probably should have known already and I’m looking forward to the follow ups. I do think that this would likely be a very hard read for a layman who wasn’t already pretty familiar with genetics and you might consider making an even more basic version of this. Lots of jargon is dropped without explanation, for example.

Your graph shows an ~40% risk compared to the normal day in that age group. Using their risk ratio you would need about 25x times the child pedestrian activity to achieve that risk reduction. That could be the case, but I’m not certain. I’m not even that confident that you’d get the >10x needed to ensure a decrease in risk. Kids tend to go to hot spots for trick-or-treating, so the really busy streets that get >25x and spring to mind easily might be hiding the (relatively) depleted streets elsewhere that account for a larger fraction of typical walking. Hence I think your presentation is optimistic: it’s right to push back on the raw numbers but I don’t think it’s clear that Halloween is substantially safer than other nights per pedestrian-hour as you claim.

I also read the denominator problem differently. I took your argument to claim that 5x number to be a lower bound for the “trick-or-treating streets compared to the same streets on a typical night” and for that, it’s definitely true. But then you had to gloss over the fact that we’re comparing entire days (and non-trick-or-treating streets) and it’s much less clear that 5x is true for all-of-Halloween compared to all-of-another-day. Therefore, their analysis*justified*using your 5x number while I think your analysis was stretching the truth.

While I appreciate the analysis, I also recently saw this article circulating: https://jamanetwork.com/journals/jamapediatrics/article-abstract/2711459

It compares just 6pm-midnight on Halloween versus the corresponding time one week early and one week later. They estimate a*10x increase*in deaths in age 4-8 children—see Figure 1. This doesn’t look like subgroup fishing since the 9-12 group is also quite large (6x increase). By your 5x correction factor, Halloween would still be more dangerous than other days for kids.I still think it could be true that Halloween is less dangerous since this hasn’t measured pedestrian activity and trick-or-treat really might be a greater than 10x increase in 4-8 year olds out on the street. But this definitely makes it look less good to me than your presentation.

Gene drives (I.e. genes that force their own propagation) do arise in nature. There are “LINE” genes that apparently make up over 20% of our genome: they encode RNA that encodes a protein that takes its own RNA and copies it back into your DNA at random locations, thereby propagating itself even more than our engineered gene drives do. With it taking up that much of our genome, I could imagine something like that killing off a species, though I’m failing to find a specific example.These are examples of selfish genes, so that might be where to read more.

It only causes female sterility, so the males keep passing it on. It reaches the whole population because the gene encodes a protein that affects the DNA and ensures it’s inheritance, rather than being a fifty fifty. If a modified and unmodified mate, then their offspring have only one copy of the modified DNA and one copy of the unmodified. They would have only a fifty fifty chance of passing that on. But if the gene has the effect of breaking other (nonmodified) copy, then the organisms natural DNA repair mechanisms will copy from the other chromosome to repair the damage. That copies the modified gene over! Now it has only the modified DNA and will pass it on with 100% chance. So will it’s offspring, forever, until there are no nonsterile females.

That looks right mathematically but seems absurd. Maybe steady state isn’t the right situation to think about this in? It’s weird that the strategy of “never reproduce” would be just as good as the usual, since not reproducing means not dying. Or we need to model the chance that the bamboo dies due to illness/fire/animals prior to getting a chance to reproduce?

So that example is of L, what is the f for it? Obviously, there’s multiple f that could give that (depending on how the loss is computed from f), with some of them having symmetries and some of them not. That’s why I find the discussion so confusing: we really only care about symmetries of f (which give type B behavior) but instead are talking about symmetries of L (which may indicate either type A or type B) without really distinguishing the two. (Unless my example in the previous post shows that it’s a false dichotomy and type A can simulate type B at a singularity.)

I’m also not sure the example matches the plots you’ve drawn: presumably the parameters of the model are a,b but the plots show it it varying x,y for fixed a=1,b=0? Treating it as written, there’s not actually a singularity in its parameters a,b.