Misalignment in which losing track of the difference between map and terrain is a core step, especially if not losing track of map-vs-terrain would mostly or entirely prevent the problem.
One example that isn’t connected to my work is that I briefly tried asking Opus 4.6 to modify the open source game Neverball so that it automatically saves a replay after each game, replacing the existing functionality where there’s a button to optionally save a replay which takes 2 extra clicks. It did the main modification fine, but then the build process was causing trouble. It didn’t build neverball, just neverputt. it wasn’t clear why. Opus4.6 saw something that could, conceivably, be an explanation of why it didn’t work. It wrote that that was why it didn’t work, then proceeded to try stuff, which ended up working. But I was able to see that the stated reason was incorrect. It never checked. It proceeded based on the incorrect assumption. it tried operations that didn’t make sense. those failed. it tried something else. it worked. it said something that didn’t make sense, thereby failing to verbally generalize why some things worked and others failed. this is of course a capability-inhibiting alignment failure, but it’s a reality-slip nonetheless.
What do you mean by grounding loss misalignment?
Misalignment in which losing track of the difference between map and terrain is a core step, especially if not losing track of map-vs-terrain would mostly or entirely prevent the problem.
Sorry to be obtuse, but could you give an example?
One example that isn’t connected to my work is that I briefly tried asking Opus 4.6 to modify the open source game Neverball so that it automatically saves a replay after each game, replacing the existing functionality where there’s a button to optionally save a replay which takes 2 extra clicks. It did the main modification fine, but then the build process was causing trouble. It didn’t build neverball, just neverputt. it wasn’t clear why. Opus4.6 saw something that could, conceivably, be an explanation of why it didn’t work. It wrote that that was why it didn’t work, then proceeded to try stuff, which ended up working. But I was able to see that the stated reason was incorrect. It never checked. It proceeded based on the incorrect assumption. it tried operations that didn’t make sense. those failed. it tried something else. it worked. it said something that didn’t make sense, thereby failing to verbally generalize why some things worked and others failed. this is of course a capability-inhibiting alignment failure, but it’s a reality-slip nonetheless.