I agree with most of what you say here, and I agree that the definition of sycophancy is a bit vague.

I think I agree that if the dumb models always tried to kill us, I’d be more afraid about future models (though the evidence isn’t very strong because of the gap between dumb and smarter model, and because for smarter models I don’t except the open web text prior talking a lot about AI take over to be as important as for dumber ones). But in the experiments described in this post (not your comment) there is a ton of spoonfeeding, and I care about the model “will” when there is no spoonfeeding. Experiments at spoonfeeding > 0 aren’t very informative.

You made me curious, so I ran a small experiment. Using the sum of abs cos similarity as loss, initializing randomly on the unit sphere, and optimizing until convergence with LBGFS (with strong wolfe), here are the maximum cosine similarities I get (average and stds over 5 runs since there was a bit of variation between runs):

It seems consistent with the exponential trend, but it also looks like you would need dim>>1000 to have any significant boost of number of vectors you can fit with cosine similarity < 0.01, so I don’t think this happens in practice.

My optimization might have failed to converge to the global optimum though, this is not a nicely convex optimization problem (but the fact that there is little variation between runs is reassuring).