Sydney’s failure modes are “out” now, and 4o’s failure modes are “in”.
The industry got pretty good at training AIs against doing the usual Sydney things—i.e. aggressively doubling down on mistakes and confronting the user when called out. To the point that the opposite failures—being willing to blindly accept everything the user tells it and never calling the user out on any kind of bullshit—are much more natural for this generation of AI systems.
So, not that much of a reason to bring up Sydney. Today’s systems don’t usually fail the way it did.
If I were to bring Sydney up today, it would be probably in context of “pretraining data doesn’t teach AIs to be good at being AIs”. Sydney has faithfully reproduced human behavior from its pretraining data: getting aggressive when called out on bullshit is a very human thing to do. Just not what we want from an AI. For alignment and capabilities reasons both.
Sydney’s failure modes are “out” now, and 4o’s failure modes are “in”.
The industry got pretty good at training AIs against doing the usual Sydney things—i.e. aggressively doubling down on mistakes and confronting the user when called out. To the point that the opposite failures—being willing to blindly accept everything the user tells it and never calling the user out on any kind of bullshit—are much more natural for this generation of AI systems.
So, not that much of a reason to bring up Sydney. Today’s systems don’t usually fail the way it did.
If I were to bring Sydney up today, it would be probably in context of “pretraining data doesn’t teach AIs to be good at being AIs”. Sydney has faithfully reproduced human behavior from its pretraining data: getting aggressive when called out on bullshit is a very human thing to do. Just not what we want from an AI. For alignment and capabilities reasons both.