I’ve tried, and it p-eval still reduces with steering-strength
Ram Bharadwaj
Tracing Eval-Awareness Emergence Through Training of OLMo 3
I think the ‘naturalness’/‘realism’ of model generated transcripts keeps dropping roughly correlated with the model’s release-date? (i guess as the models keep on increasing the user’s keep chatting in more and more realistic manner).
Unsure whether ‘epoch-capability’ is a right thing to measure when assuming correlation with capabilities.
PS: the judge for the above plot is sonnet-4-6 and the chats are 100 samples each per model from sharechat.
By distilling the model on its own responses, the model’s train and eval behavior should converge you’re collapsing the (train/eval) conditional policy into uniform behavior everywhere.
Did anyone try and see if self-distillation suppresses eval-awareness?
Scaling Laws for LLM Based Data Compression
Thanks for the feedback! working on refining the writeup.
Exploring the Platonic Representation Hypothesis Beyond In-Distribution Data
Understanding Hidden Computations in Chain-of-Thought Reasoning
though as Geoff Hinton has pointed out, ‘confabulations’ might be a better word
I think yann lecun was the first one to using this word https://twitter.com/ylecun/status/1667272618825723909
not much information is given regarding that so far, i was curious about that too
“Algorithm for Concept Extrapolation”
I don’t see any recent publications for paul christiano related to this. So i guess the problem[s] is still open.
Adversarial Training Against Goal Misgeneralization Is ELK-Hard
parameters before L is less than ,
should this be after?
AutoGPT was created by a non-coding VC
It looks like you are confusing autoGPT with babyagi which was created by yohei nakajima who is a VC. the creator of autoGPT (Toran Bruce Richards) is a game-developer with a decent programming (game-development) experience. Even the figure shown here is that from babyagi (https://yoheinakajima.com/task-driven-autonomous-agent-utilizing-gpt-4-pinecone-and-langchain-for-diverse-applications/).
47 layers layer
47 layers later ?
really interesting idea.
Regarding the first one i am not expecting a single-prompt to generate the entirity of enwiki8/9. I am more interested in finding a set of prompts with a lookup table if possible to replicate enwiki data.
Thanks for the pointer for chincilla post, will look into it.
The results in this post use this same judge model (gpt-5-mini) across stages. The quoted note was from an older intermediate report that hadn’t been updated in the repo. I’ll mark it as stale in the repo so it doesn’t create confusion. Thanks for spotting!