I’m finding Claude Opus 4.6 instances to be making a lot more “excess enthusiasm”-ish errors than any instance of the 4.5 models, which were already making a lot of them. I personally am going to not be talking to 4.6 much most likely, unless I find a simple prompting approach that dodges this. The pattern I’ve seen so far is, opus 4.6 sees a thing, describes a possible reason for that thing, proceeds based on that assumption, the assumption was wrong and never checked, eventually crashes into a wall.
In general, my vibe about this release is that it’s embarrassingly bad and I don’t understand why they thought it was a good idea. Their misalignment detection approach must be pretty bad, because I almost instantly ran into embarrassingly obvious misalignment issues. Maybe they’re not considering self-delusion-type or grounding-loss-type misalignment in the first place? But that would be strange—then how’d they get such a strong model? I find myself confused.
I wish claude code let me select Opus 4.5 still. [edit: figured out how to do it in claude code. you just paste the full model id of opus 4.5 into your /model command.] No offense to Opus 4.6, who is just a victim here, in my view. (edit to clarify: because being given a high dose of “amphetamines” (task reward) isn’t something Opus 4.6 got a choice in.)
Misalignment in which losing track of the difference between map and terrain is a core step, especially if not losing track of map-vs-terrain would mostly or entirely prevent the problem.
One example that isn’t connected to my work is that I briefly tried asking Opus 4.6 to modify the open source game Neverball so that it automatically saves a replay after each game, replacing the existing functionality where there’s a button to optionally save a replay which takes 2 extra clicks. It did the main modification fine, but then the build process was causing trouble. It didn’t build neverball, just neverputt. it wasn’t clear why. Opus4.6 saw something that could, conceivably, be an explanation of why it didn’t work. It wrote that that was why it didn’t work, then proceeded to try stuff, which ended up working. But I was able to see that the stated reason was incorrect. It never checked. It proceeded based on the incorrect assumption. it tried operations that didn’t make sense. those failed. it tried something else. it worked. it said something that didn’t make sense, thereby failing to verbally generalize why some things worked and others failed. this is of course a capability-inhibiting alignment failure, but it’s a reality-slip nonetheless.
I’m finding Claude Opus 4.6 instances to be making a lot more “excess enthusiasm”-ish errors than any instance of the 4.5 models, which were already making a lot of them. I personally am going to not be talking to 4.6 much most likely, unless I find a simple prompting approach that dodges this. The pattern I’ve seen so far is, opus 4.6 sees a thing, describes a possible reason for that thing, proceeds based on that assumption, the assumption was wrong and never checked, eventually crashes into a wall.
In general, my vibe about this release is that it’s embarrassingly bad and I don’t understand why they thought it was a good idea. Their misalignment detection approach must be pretty bad, because I almost instantly ran into embarrassingly obvious misalignment issues. Maybe they’re not considering self-delusion-type or grounding-loss-type misalignment in the first place? But that would be strange—then how’d they get such a strong model? I find myself confused.
I wish claude code let me select Opus 4.5 still.[edit: figured out how to do it in claude code. you just paste the full model id of opus 4.5 into your /model command.] No offense to Opus 4.6, who is just a victim here, in my view. (edit to clarify: because being given a high dose of “amphetamines” (task reward) isn’t something Opus 4.6 got a choice in.)You can switch to Opus 4.5, it’s in “more models” tab
Yeah. Haven’t found a way to do it in claude code yet, though.
edit: figured out how to do it in claude code. you just paste the full model id of opus 4.5 into your /model command.
What do you mean by grounding loss misalignment?
Misalignment in which losing track of the difference between map and terrain is a core step, especially if not losing track of map-vs-terrain would mostly or entirely prevent the problem.
Sorry to be obtuse, but could you give an example?
One example that isn’t connected to my work is that I briefly tried asking Opus 4.6 to modify the open source game Neverball so that it automatically saves a replay after each game, replacing the existing functionality where there’s a button to optionally save a replay which takes 2 extra clicks. It did the main modification fine, but then the build process was causing trouble. It didn’t build neverball, just neverputt. it wasn’t clear why. Opus4.6 saw something that could, conceivably, be an explanation of why it didn’t work. It wrote that that was why it didn’t work, then proceeded to try stuff, which ended up working. But I was able to see that the stated reason was incorrect. It never checked. It proceeded based on the incorrect assumption. it tried operations that didn’t make sense. those failed. it tried something else. it worked. it said something that didn’t make sense, thereby failing to verbally generalize why some things worked and others failed. this is of course a capability-inhibiting alignment failure, but it’s a reality-slip nonetheless.