Logan Riggs

Karma: 3,665

Temporarily Losing My Ego

Logan Riggs28 Oct 2025 16:41 UTC

21 points

3 comments3 min readLW link

Logan Riggs 23 Oct 2025 21:20 UTC
LW: 6 AF: 4
0
AF
in reply to: Jesse Hoogland’s comment on: Jesse Hoogland’s Shortform
Great work!
Listened to a talk from Philipp on it today and am confused on why we can’t just make a better benchmark than LDS?
Why not just train eg 1k different models, where you left 1 datapoint out? LDS is noisy, so I’m assuming 1k datapoints that exactly capture what you want is better than 1M datapoints that are an approximation $.$ ^[1]
As an estimate, Nano-GPT speedrun takes a little more than 2 min now, so you can train 1001 of these in:
2.33*1k/60 = 38hrs on 8 H100′s which is maybe 4 b200′s which is $24/hr, so ~$1k.
And that’s getting a 124M param LLM trained on 730M tokens up to GPT2 level. Y’all’s quantitative setting for Fig 4 was a 2M parameter Resnet on Cifar-10 on 5k images, which would be much cheaper to do (although the GPT2 one has been very optimized, so you could just do the speedrun one but on less data).
LDS was shown to be very noisy, but a colleague mentioned that this could be because 5k images is a very small amount of data. I guess another way to validate LDS is running the expensive full-training run on a few datapoints.
Confusion on LDS Hyperparameter Sweep Meaning
Y’all show in Fig 4 that there are large error bars across seeds for the different methods. This ends up being a property of LDS’s noisiness, as y’all show in Figures 7-8 (where BIF & EK-FAC are highly correlated). This means that, even using noisy LDS, you don’t need to re-run 5 times if a new method is much better than previous ones (but only if it’s narrowly better).
What I’m confused about is why you retrained on 100 different ways to resample the data at each percentage? Is this just because LDS is noisy, so you’re doing the thing where randomly sampling 100 datapoints 500 times gives you a good approximation of the causal effect of each individual datapoint (or that is what LDS actually is)? Was there high variance in the relative difference between methods across the 100 retrained models?
Other Experiments
Just wild speculation that there are other data attribution methods as opposed to prediction of the output. When a model “groks” something, there will be some datapoints that were more important for that happening that should show up in an ideal data attribution method.
Similar w/ different structures forming in the dataset (which y’all’s other paper shows AFAIK).
[Note: there’s a decent chance I’ve terribly misunderstood y’all’s technique or misread the technical details, so corrections are appreciated]
1. ^
  It initially seemed confusing on how to evaluate this, but I think we need to look at the variance over the distribution of datapoints. If BIF is consistently more accurate than EK-FAC over eg 100 randomly sampled datapoints, then that’s a good sign for BIF; however, if there’s a high level of variance, then we’d need more data to differentiate between the two. I do think higher quality data attribution methods would have higher signal, so you’d need less data. For example, I predict that BIF does better than Trak on ~all datapoints (but this is an empirical question).

Logan Riggs 23 Oct 2025 17:39 UTC
2 points
0
on: Penny’s Hands
I was actually expecting Penny to develop dystonia coincidentally, and the RL would tie-in by needing to be learned in reverse ie optimizing from dystonic to normal. It is a much more pleasant ending than the protagonist’s tone the whole way through.
If I was writing a fanfic of this, I’d keep the story as is (+ or—the last paragraph), but then continue into the present moment which leads to the realization.

Logan Riggs 17 Oct 2025 16:05 UTC
2 points
0
on: Finding Features in Neural Networks with the Empirical NTK
It’s very exciting to have an orthogonal research direction that finds these ground truth features, which might possibly even generalize(!!). Please do report future results, even if negative (though your Malladi et al link is some evidence in the positive).
It’s also very confusing since I’m unsure how this all fits in with everything else? This clearly works in these cases. SAE’s clearly work in some cases as well (and same w/ the parameter decomposition research), but what’s the “Grand Theory of NN Interp” that explains all of these results?
In general, I believe it’s very important that we hedge our bets on research directions for interp. The main reason being one of them actually panning out, but even if not, they already provide unique pieces of evidences for later researchers (maybe us, maybe LLMs, lol) to hopefully figure out that “Grand Theory of NN Interp”.

Logan Riggs 10 Oct 2025 16:47 UTC
3 points
0
in reply to: Charlie Steiner’s comment on: Hospitalization: A Review
Thanks Charlie:)

Logan Riggs 10 Oct 2025 16:46 UTC
6 points
0
in reply to: nim’s comment on: Hospitalization: A Review
My wife and I are pretty sure the paramedic checked w/ a stethascope, and so did the doctor when we arrived. But they didn’t mention anything until the x-ray.
The paramedics might not’ve done the pads due to being a few minutes ride from the hospital (I’m literally on the same block as the hospital), but I did recieve them at the hospital (I’ve still got some glue-residue on me actually).
When nursing staff is working long shifts and spread between a lot of patients,
Ya, mine were working 12 hour shifts, 3 days/nights in a row.

Logan Riggs 10 Oct 2025 16:38 UTC
7 points
0
in reply to: Viliam’s comment on: Hospitalization: A Review
Well I signed up for half-haven, and thought “Well I need to write a post every 2 days”, haha. (I’m also more of an over-sharer than others)
Thanks for hosting half-haven btw!

Logan Riggs 10 Oct 2025 16:36 UTC
6 points
0
in reply to: AlphaAndOmega’s comment on: Hospitalization: A Review
That seems incredibly important, so I’ve added to the main text. Thanks!

Logan Riggs 9 Oct 2025 17:52 UTC
2 points
0
in reply to: williawa’s comment on: Replacing RL w/ Parameter-based Evolutionary Strategies
Would be interesting to empirically check the reward surrounding reward hacking solutions. Should be able to plot the reward against variance and see if that’s different than other spots.

Hospitalization: A Review

Logan Riggs9 Oct 2025 14:36 UTC

352 points

19 comments9 min readLW link

Logan Riggs 8 Oct 2025 11:35 UTC
2 points
0
in reply to: d_el_ez’s comment on: Making Your Pain Worse can Get You What You Want
Thanks for making these connections!
“Learned helplessness” only seems to cover some of these cases though. My exaggerated tiredness didn’t relate to a feeling of lack of control, but I do agree that it is a learned behavior.
...
Cognitive Defusion is the idea that your thoughts & emotions are separate from you. They aren’t immutable facts.
I do think that relates, but I want to communicate:
(1) Your mindset can drastically change how you experience things. (2) A specific subset of those mindsets involve exaggerating your suffering (sincerely), as a learned behavior that got you what you wanted in the past from those in power over you.

The oreo/heaven-or-hell point is to drive home #(1)
So yes, I agree then!
...
For relating to my close friends, it usually involves them telling themselves a very sad story (which is true but only focuses on a small subset of details), which can be asked for them to articulate, which can then be challenged directly (in a loving, tactful way; almost socratic?) while providing comfort.

Yeah, I think what I’m trying now is a CBT solution of “Notice your brain consistently telling yourself the worse-case story”, and might need to lean more into CBT. Thanks!

Logan Riggs 8 Oct 2025 11:08 UTC
4 points
0
in reply to: williawa’s comment on: Replacing RL w/ Parameter-based Evolutionary Strategies
One intuition I can offer is that you end up in wider basins of reward/loss landscape.
If you want to hit a very narrow basin, but your variance is too high, then you might not sample the high reward point.

Although, sampling enough points which do include the reward hacking weights will eventually center you on the reward hacking weights.
Suppose you sample 1k points, and one of them is the reward hacking weight with reward 100 (and the rest 1). Then you will move towards the reward hacking weight the most, which would make it more likely sampled the next time AFAIK. So maybe not??

The second intuition is the paths being substantially different, which can be quantified as well.

Logan Riggs 8 Oct 2025 11:08 UTC
2 points
0
in reply to: williawa’s comment on: Replacing RL w/ Parameter-based Evolutionary Strategies
The paper does have a few empirical experiments showing they arrive at different solutions. Specifically the KL-reward plot. Would you need more settings to be convinced here?

Logan Riggs 8 Oct 2025 2:57 UTC
2 points
0
in reply to: Alex A’s comment on: Telling the Difference Between Memories & Logical Guesses
Could you elaborate on this? I understand you saying they’re fungible as in the “memory” of doing it today is basically the same memory as doing it yesterday.
Is it like chunking? You’ve learned a specific pattern that doesn’t change, which you just bring forth the general concept of “brush teeth”?

Replacing RL w/ Parameter-based Evolutionary Strategies

Logan Riggs8 Oct 2025 1:02 UTC

63 points

5 comments3 min readLW link

Telling the Difference Between Memories & Logical Guesses

Logan Riggs7 Oct 2025 5:46 UTC

28 points

3 comments4 min readLW link

Making Your Pain Worse can Get You What You Want

Logan Riggs5 Oct 2025 0:19 UTC

86 points

5 comments3 min readLW link

How to Feel More Alive

Logan Riggs2 Oct 2025 15:45 UTC

49 points

2 comments4 min readLW link

Logan Riggs 28 Sep 2025 23:40 UTC
2 points
0
in reply to: romeostevensit’s comment on: Solving the problem of needing to give a talk
What’s an example of #2?

Logan Riggs 16 Sep 2025 1:12 UTC
5 points
7
on: A Review of Nina Panickssery’s Review of Scott Alexander’s Review of “If Anyone Builds It, Everyone Dies”
For instance, if the model was at every turn searching over possible actions and choosing one that will maximize this reward function. (Of course, no one knows how to do that in practice yet, but everyone’s on the same page about that.)
I’m interpreting you as saying “If we solved outer alignment & had a perfect reward function, it would be good if the model itself was optimizing for that reward function (ie inner alignment)”
In which case, we are not on the same page (ie inner & outer alignment decompose a hard problem into two more difficult ones).
For the book, it’s interesting they went w/ the evolution argument. I still prefer the shard theory analogy of humans being misaligned w/ the reward system (ie I intentionally avoid taking fentanyl, even though that would be very rewarding/reinforcing), which can still end up in similar sharp-left turns if the model eg takes fentanyl (or other goals).
Evolution is still not believed by everyone in the US (though oddly ranging from 17%-37%), which can be offputting to some, & also you have to understand evolution to an extent. I assume most folks can sort’of see that if you really optimized for evolution, you’d do a lot more than we are to pass on genes; however, optimizing for “evolution” is underconstrained & can have arguments for “well we’re actually still doing quite good by evolution’s sake”.
Now instead let’s focus on optimizing for the human reward system. People believe in very addictive drugs & can see the effects. It’s pretty easy to imagine “a drug addict becomes extremely powerful, and you try to stop them. What goes wrong?”. It’s also quite coherent what optimizing your reward system looks like!

The evolution analogy is still good under the inner-outer alignment frame, since humans would be evolution’s seat & it seems difficult to avoid the same issues. Whereas the human reward system seems easier (eg give the AI fentanyl). This can be worked around by discussing how hard it is to design the perfect reward function which doesn’t end up goodharting.

Logan Riggs

Tem­porar­ily Los­ing My Ego

Confusion on LDS Hyperparameter Sweep Meaning

Other Experiments

Hospi­tal­iza­tion: A Review

Re­plac­ing RL w/​ Pa­ram­e­ter-based Evolu­tion­ary Strategies

Tel­ling the Differ­ence Between Me­mories & Log­i­cal Guesses

Mak­ing Your Pain Worse can Get You What You Want