I am still to finish my own small tests and lm-evals on GPT 5, but I am already quite concerned about deception, and even functional sandbagging. How much of such behaviour have the models absorbed from their training by now?
I have poor insight into the Chinese models, but so far the GPT 5 series is the scariest one I have interacted with.
I am still to finish my own small tests and lm-evals on GPT 5, but I am already quite concerned about deception, and even functional sandbagging. How much of such behaviour have the models absorbed from their training by now?
I have poor insight into the Chinese models, but so far the GPT 5 series is the scariest one I have interacted with.