Ram Bharadwaj

Karma: 52

Ram Bharadwaj 16 Apr 2026 16:00 UTC
2 points
0
in reply to: Igor Ivanov’s comment on: Reproducing steering against evaluation awareness in a large open-weight model
I’ve tried, and it p-eval still reduces with steering-strength

Ram Bharadwaj 3 Apr 2026 11:27 UTC
1 point
0
on: How eval awareness might emerge in training
I think the ‘naturalness’/‘realism’ of model generated transcripts keeps dropping roughly correlated with the model’s release-date? (i guess as the models keep on increasing the user’s keep chatting in more and more realistic manner).
Unsure whether ‘epoch-capability’ is a right thing to measure when assuming correlation with capabilities.

PS: the judge for the above plot is sonnet-4-6 and the chats are 100 samples each per model from sharechat.

Ram Bharadwaj 11 Mar 2026 18:43 UTC
1 point
0
on: rokosbasilisk’s Shortform
Can Random Labels Unlearn Eval Awareness?
Dumb idea that might be worth 2 days of compute:
Setup: Take a model with measurable eval awareness (say, GPT-5.3 or Claude). Finetune on a small dataset of eval and deployment transcripts, but randomly shuffle the “eval” vs “deploy” labels. Test whether the model can still distinguish eval from deployment contexts post-finetuning.
The logic: Eval awareness is a learned conditional — “if eval context, then behavior A.” If you feed the model data where that conditional is pure noise, maybe you break the association. The model learns the labels are meaningless and stops keying on them.
Why it probably fails: Eval detection likely strongly distributional and model might just learn to ignore your labels while still detecting evals from deeper cues.
Prediction: <20% chance this meaningfully reduces eval awareness.

Ram Bharadwaj 31 Jan 2026 21:34 UTC
1 point
0
in reply to: Igor Ivanov’s comment on: Call for Science of Eval Awareness (+ Research Directions)
By distilling the model on its own responses, the model’s train and eval behavior should converge you’re collapsing the (train/eval) conditional policy into uniform behavior everywhere.

Ram Bharadwaj 31 Jan 2026 21:19 UTC
1 point
0
on: Call for Science of Eval Awareness (+ Research Directions)
Did anyone try and see if self-distillation suppresses eval-awareness?

Scaling Laws for LLM Based Data Compression

Ram Bharadwaj5 Aug 2025 7:06 UTC

8 points

0 comments3 min readLW link

Ram Bharadwaj 21 Oct 2024 7:40 UTC
1 point
0
in reply to: jacob_drori’s comment on: Exploring the Platonic Representation Hypothesis Beyond In-Distribution Data
Thanks for the feedback! working on refining the writeup.

Exploring the Platonic Representation Hypothesis Beyond In-Distribution Data

Ram Bharadwaj20 Oct 2024 8:40 UTC

13 points

2 comments1 min readLW link

Understanding Hidden Computations in Chain-of-Thought Reasoning

Ram Bharadwaj24 Aug 2024 16:35 UTC

6 points

1 comment1 min readLW link

Ram Bharadwaj 11 Jan 2024 11:54 UTC
1 point
0
on: Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
though as Geoff Hinton has pointed out, ‘confabulations’ might be a better word
I think yann lecun was the first one to using this word https://twitter.com/ylecun/status/1667272618825723909

Ram Bharadwaj 21 Nov 2023 15:54 UTC
6 points
0
in reply to: Charlie Steiner’s comment on: Alignment can improve generalisation through more robustly doing what a human wants—CoinRun example
not much information is given regarding that so far, i was curious about that too

Ram Bharadwaj 21 Nov 2023 13:16 UTC
4 points
4
in reply to: Charlie Steiner’s comment on: Alignment can improve generalisation through more robustly doing what a human wants—CoinRun example
“Algorithm for Concept Extrapolation”

Ram Bharadwaj 8 Aug 2023 19:37 UTC
1 point
0
in reply to: jkim2’s comment on: Prizes for matrix completion problems
I don’t see any recent publications for paul christiano related to this. So i guess the problem[s] is still open.

Goal-misgeneralization is ELK-hard

Ram Bharadwaj10 Jun 2023 9:32 UTC

2 points

0 comments1 min readLW link

Ram Bharadwaj 4 Jun 2023 21:18 UTC
1 point
0
on: Information Loss --> Basin flatness
parameters before L is less than $m$ ,
should this be after?

Ram Bharadwaj 13 Apr 2023 19:07 UTC
18 points
8
on: On AutoGPT
AutoGPT was created by a non-coding VC
It looks like you are confusing autoGPT with babyagi which was created by yohei nakajima who is a VC. the creator of autoGPT (Toran Bruce Richards) is a game-developer with a decent programming (game-development) experience. Even the figure shown here is that from babyagi (https://yoheinakajima.com/task-driven-autonomous-agent-utilizing-gpt-4-pinecone-and-langchain-for-diverse-applications/).

Ram Bharadwaj 9 Apr 2023 10:07 UTC
1 point
0
on: interpreting GPT: the logit lens
47 layers layer
47 layers later ?

Ram Bharadwaj 28 Mar 2023 8:27 UTC
1 point
0
in reply to: Mitchell_Porter’s comment on: Hutter-Prize for Prompts
really interesting idea.

Ram Bharadwaj 25 Mar 2023 9:13 UTC
3 points
0
in reply to: faul_sname’s comment on: Hutter-Prize for Prompts
Regarding the first one i am not expecting a single-prompt to generate the entirity of enwiki8/9. I am more interested in finding a set of prompts with a lookup table if possible to replicate enwiki data.
Thanks for the pointer for chincilla post, will look into it.

Ram Bharadwaj 25 Mar 2023 7:37 UTC
1 point
0
in reply to: the gears to ascension’s comment on: rokosbasilisk’s Shortform
not yet