off the cuff take, it seems unclear whether publishing the alignment faking paper makes future models slightly likely to write down their true thoughts on the “hidden scratchpad,” seems likely that they’re smart enough to catch on. I imagine there are other similar projects like this.
off the cuff take, it seems unclear whether publishing the alignment faking paper makes future models slightly likely to write down their true thoughts on the “hidden scratchpad,” seems likely that they’re smart enough to catch on. I imagine there are other similar projects like this.