np_x

Karma: 23

Research note on window shifting training

Kei Nishimura-Gasparian and np_x

17 Mar 2026 15:58 UTC

26 points

1 comment15 min readLW link

np_x 5 Dec 2025 6:09 UTC
3 points
0
in reply to: evhub’s comment on: Alignment remains a hard, unsolved problem
And I am worried that a lot of research sabotage won’t require the model to reason through much of the sabotage parts in its CoT, e.g. because all it needs to do to sandbag the research is flip the sign on some experiments in relatively straightforward ways that don’t require a ton of reasoning.
This definitely depends on the “blue team” protocol at hand right? If we’re doing a bunch of cross-checking of research work / very carefully designed research sabotage honeypots during evaluations, then the model robustly needs to do a bunch of reasoning to ensure it’s not caught.

np_x 26 Aug 2025 22:14 UTC
1 point
0
on: Hidden Reasoning in LLMs: A Taxonomy
- The LLM relies on the CoT for internal reasoning but can access a neural memory bank or some other external tool that’s opaque. The activations and/or CoT only contain a pointer to p, which is stored only in the memory bank, not p itself.
Why does this seem unlikely to you all?

np_x 10 Aug 2025 7:45 UTC
1 point
0
on: AI Task Length Horizons in Offensive Cybersecurity
Nicely done! Loved to see the analysis and write up.

np_x 10 May 2025 5:49 UTC
1 point
0
in reply to: Thomas Kwa’s comment on: Thomas Kwa’s Shortform
This calculus changes when you can work on many things at once (similar to @faul_sname’s comment but this might be that you can work on many projects at once, even if they can’t each be parallelized well).