FWIW, I have so far liked almost all your writing on this, but that article seemed to me like the weakest I’ve seen you write. It’s just full of snark and super light on arguments. I even agree with you that a huge amount of safety research is unhelpful propaganda! Maybe even this report by Anthropic, but man do I not feel like you’ve done much to help me figure out whether that’s actually the case, in this post that you write (whereas I do think you’ve totally done so in others you’ve written).
Like, a real counterargument would be for you to even just make a cursory attempt at writing your own scenario, then report what the AI does in that case. It’s hard to write a realistic scenario with high stakes, and probably they will all feel a bit fake, and yes that’s a real issue, but you make it sound as if it would be trivially easy to fix. If it is trivially easy to fix, write your own scenario and then report on that. The article you wrote here just sounds like you sneering at what reads to me as someone genuinely trying to write a bunch of realistic scenarios.
If anything, every time I’ve seen someone try to write realistic scenarios they make the scenario not remotely stupid enough! In the real world FTX staff talk to each other in a Signal chat named “Wirefraud”, while Sam Altman straightforwardly lies to his board, while the president of the United States is literally a famous TV celebrity with a speaking cadence that sounds like parodies from an 80s movie (this is not to mock any of those people, but I really cannot actually distinguish your mockery of the fictional scenario from the absurd reality we do indeed get to live in). When I apply the model of the standards you have here to real life, at least using my current best understanding, I get much more scorn for a description of actual reality instead of this hypothetical.
And as I said, it feels like in this case it’s so much easier to add light instead of heat. Just write your own short-ish fictional scenario that doesn’t have the issues you describe. Show the outputs that disagree with the study. Everyone can form their own opinion, no 20 paragraphs of extensive mockery necessary.
FWIW, I have so far liked almost all your writing on this, but that article seemed to me like the weakest I’ve seen you write. It’s just full of snark and super light on arguments. I even agree with you that a huge amount of safety research is unhelpful propaganda! Maybe even this report by Anthropic, but man do I not feel like you’ve done much to help me figure out whether that’s actually the case, in this post that you write (whereas I do think you’ve totally done so in others you’ve written).
Like, a real counterargument would be for you to even just make a cursory attempt at writing your own scenario, then report what the AI does in that case. It’s hard to write a realistic scenario with high stakes, and probably they will all feel a bit fake, and yes that’s a real issue, but you make it sound as if it would be trivially easy to fix. If it is trivially easy to fix, write your own scenario and then report on that. The article you wrote here just sounds like you sneering at what reads to me as someone genuinely trying to write a bunch of realistic scenarios.
If anything, every time I’ve seen someone try to write realistic scenarios they make the scenario not remotely stupid enough! In the real world FTX staff talk to each other in a Signal chat named “Wirefraud”, while Sam Altman straightforwardly lies to his board, while the president of the United States is literally a famous TV celebrity with a speaking cadence that sounds like parodies from an 80s movie (this is not to mock any of those people, but I really cannot actually distinguish your mockery of the fictional scenario from the absurd reality we do indeed get to live in). When I apply the model of the standards you have here to real life, at least using my current best understanding, I get much more scorn for a description of actual reality instead of this hypothetical.
And as I said, it feels like in this case it’s so much easier to add light instead of heat. Just write your own short-ish fictional scenario that doesn’t have the issues you describe. Show the outputs that disagree with the study. Everyone can form their own opinion, no 20 paragraphs of extensive mockery necessary.