Update 2: Ablations for “Frontier models are capable of in context scheming”
Bruh. Two years of me not fully keeping up with alignment research and this is how bad it’s gotten???
I’m surprised I could just randomly think of an idea and boom there’s a paper on it.
Update 2: Ablations for “Frontier models are capable of in context scheming”
Bruh. Two years of me not fully keeping up with alignment research and this is how bad it’s gotten???
I’m surprised I could just randomly think of an idea and boom there’s a paper on it.