For Claude, I’ve noticed this misbehavior seems to be mostly clustered around a demoralized/giving up/”this is impossible” mindset and that 4.5 and 4.6 don’t take much in the way of negative feedback or setbacks to start falling into the basin.
But also relatively simple mitigations seem weirdly effective like extolling the virtues/value of incremental progress (learning of a problem you had before but didn’t know about framed as progress. Understanding a problem we didn’t before framed as progress.) Also framing failing unit tests as valuable diagnostic feedback and not a personal failing.
As others noted, effective task decomposition seems hit or miss without hints, and without decomposition every big problem feels impossible.
Which is to say that happy Claude with a plan and a path to success, seems broadly aligned and effective, but the demoralized “I can’t do this so I should BS for whatever points I can salvage” mode feels way easier to get into vs 4.0 and is every bit as misaligned as the post notes.
This sounds related to assuming everything must be a dog-whistle for some other less “acceptable” idea and then rebutting the idea you must secretly be advocating. In your example, presumably that every instance of pointing out a public health mistake is advocating against the concept of public health broadly or the value of specific public health institutions.