The “office-political immune response” frame doesn’t fit at all and if anything, this is a document type I’ve never seen anywhere else, most people don’t have this on their radar, and I think it can sound weird to many orgs outside the AI sphere. That’s part of why we wrote it up. So I’m honestly not sure what to make of your point, but fair enough for covering the case where it would have applied, it just doesn’t here.
Tom DAVID
That’s funny, because I added this one manually and wondered if it might sound like AI. I finally left it in, but if it trigger this vibe, I’ll remove it. Having an emoji is the by default when it comes from a Notion page And the default one is the lightbulb. Thanks for your comment.
Protecting Cognitive Integrity: Our internal AI use policy (V1)
Claude Mythos Preview: Analysis of Anthropic’s Public Announcement
A Systematic Approach to AI Risk Analysis Through Cognitive Capabilities
Your first two bullet points are very accurate; it would indeed be relevant to continue by addressing these points further.
Finally, regarding your last bullet point, I agree. Currently, we do not know if it is possible to develop such safeguards, and even if it were, it would require time and further research. I fully agree that this should be made more explicit!!
Call for evaluators: Participate in the European AI Office workshop on general-purpose AI models and systemic risks
Workshop Report: Why current benchmarks approaches are not sufficient for safety?
“Instead of building in a shutdown button, build in a shutdown timer.”
-> Isn’t that a form of corrigibility with an added constraint? I’m not sure what would prevent you from convincing humans that it’s a bad thing to respect the timer, for example. Is it because we’ll formally verify we avoid deception instance? It’s not clear to me but maybe I’ve misunderstood.
I agree, I’ll take it into account, thanks for the comment!