Avyukth Nilajagi

Karma: 4

Fibonacci Structure in Harmonic Series Partitions

Avyukth Nilajagi12 May 2026 4:26 UTC

5 points

1 comment2 min readLW link

Avyukth Nilajagi 25 Mar 2026 22:48 UTC
1 point
0
on: Finding X-Risks and S-Risks by Gradient Descent
I’m pretty sure you can compute log[P(response | prompt)] by summing the probabilities of response tokens in the logits for the given prompt. A little confused on why you are multiplying log-probs of “yes” token for both questions.

Avyukth Nilajagi 10 Jan 2026 20:00 UTC
1 point
0
on: Finding Features Causally Upstream of Refusal
Super cool work! I’ve been thinking about using this as a measure of misalignment and seeing how it scales with RLVR training steps or even model size. For instance, extracting a behavior concept vector at multiple checkpoints during RLVR training and computing cosine similarity to the extracted SAE latents from the base model.
What do you think?