Protecting Cognitive Integrity: Our internal AI use policy (V1)
We (at GPAI Policy Lab) want to share our V1 policy as an invitation for pushback. Some of what motivates it is our extrapolations of AI capabilities, internal conversations about their effects on cognition, and some empirical evidence. I think the expected cost of being somewhat over-cautious here is lower than the cost of being under-cautious, and the topic deserves considerably more attention than it’s currently getting. I’d love to see more orgs publish their own policies on this, both to compare experiences and to develop shared best practices.
I’d particularly welcome:
- Counterarguments from people who think this kind of policy is overblown, counterproductive, or targets the wrong mechanisms.
- Lessons from other AI safety or AI policy orgs that have tried something similar, what worked, what didn’t, what you’d change for V2.
- Specific critiques of the restrictions themselves: too narrow, too broad, wrong category, wrong threshold.
- Or even alternative framings. Do you think “cognitive integrity” is the right handle? Is this a special case of a more general problem we should be thinking about differently?
If you’ve written anything on this, or are working on something similar inside your own organization, I’d be very interested in hearing from you.
Why I’m writing this
I want to share a policy we’ve put in place internally at the GPAI Policy Lab, because I think the underlying problem is one that more AI safety and AI policy organizations should be thinking (and doing) something about now, rather than later.
The concern is that daily use of capable AI systems can gradually compromise the cognitive integrity we most need to do our work. I take this very seriously, and I think more people should. A significant part of what we do requires good reasoning, good judgment, and object-level understanding of the issues.
To preserve that, and to keep making progress in a world with AI agents, I don’t think individual willpower is enough. My view is that we need norms, concretely specified and shared, rather than asking each person to manage this on their own. The failure modes are subtle enough, and could be fast enough, that people often don’t notice them happening, which is part of what makes this potentially dangerous.
So we wrote a V1 and we’re applying it internally. I’m sharing it publicly for two reasons:
One is that having an internal policy, even an imperfect one, is probably a useful lever for awareness, both inside your own org and more broadly. Writing something concrete forces specificity, and specificity is what lets you actually notice when you’re violating it. If this problem is real and growing, I’d like the conversation to be happening inside more orgs sooner rather than later.
The other is that I want feedback. A broader conversation on these questions should probably exist and doesn’t really, as far as I can tell. I’d like to know what other AI safety and AI policy organizations have tried, what has worked, what hasn’t, and what we’re not seeing. Our V1 is only one data point but I’d like there to be many more.
What follows is a translation of our internal document, originally written in French. I’ve kept the original structure, hard restrictions, then warning signals, then what to do about them.
The policy
We use AI every day, on many tasks. Daily interaction with AI systems can affect our cognitive and judgment capacities. We already know these effects can worsen as model capabilities grow, and we think it’s necessary to take preventive measures to avoid them or at least limit them.
The document sets out hard restrictions, and the signals we watch for collectively.
1. Hard restrictions
No delegation of value judgments to AI, to evaluate a situation, an action, or a moral dilemma. Forming a considered judgment, built on reasoning and properly supported, is a core capacity we want to preserve and deliberately keep exercising. It’s not a capacity we want to delegate.
Good uses:
Asking AI how different schools of thought would approach a given dilemma, to enrich your own thinking.
Asking AI to list arguments for and against a position, likely objections, or angles you hadn’t considered.
Uses to avoid:
Asking AI “should we publish this note?”, “is this position morally defensible?”, “am I right in this disagreement?”
Using AI’s response as an arbiter in a team disagreement (“I checked with Claude, it agrees with me”).
No emotional management, relationship management, or metaphysical conversation via AI in a professional context. These are the conversations where the documented patterns of cognitive drift (AI psychosis among others) tend to crystallize and where they have been most frequently observed. They also short-circuit exchanges that should have happened inside the team.
Good uses:
“I’ve thought about what I want to say to X and why; I’m asking AI to help me rephrase for clarity or diplomacy.”
“I’ve drafted a difficult message; I’m asking AI to spot ambiguous passages or ones that could land badly.”
Uses to avoid:
Asking AI “am I right to be annoyed with X?”
Asking AI “how should I react to what Y said?” and delegating the judgment.
Venting professional frustration to AI to offload it. At best the problem is evacuated without being resolved; at worst, a conversation that should have happened inside the team doesn’t happen.
Sending a message whose final wording came from AI without checking that it actually matches what you think and how you want to be perceived.
No prolonged, solitary AI sessions on high-stakes topics. If you find yourself in conversation with AI for more than 90 minutes on the same topic—a dilemma, a consequential decision, an analysis whose conclusions depend on initial framing—without having talked to a human, stop. The documented patterns of cognitive drift start with prolonged, isolated sessions. The phenomenon will likely intensify as models become more capable, so it’s better to install the norm upstream.
Good uses:
Working two hours with AI on reformatting, debugging, or layout. The rule doesn’t apply to mechanical tasks.
Uses to avoid:
Stringing together several hours of reflection without talking to anyone.
No AI interaction as the first action for reasoning and analysis tasks. These faculties are critical for what we do at GPAI Policy Lab. It’s strictly necessary to maintain and keep developing our capacity to think and reason without thinking primarily through framings we didn’t produce.
Good uses:
“I started by reasoning on my own, I thought about my plan and the arguments; now I want to stress-test and rephrase with AI.”
Using AI for mechanical translation, formatting, research.
Uses to avoid:
Using AI to start a reasoning process and build an argument from scratch.
2. Individual and collective warning signals
Erosion of cognitive integrity can be subtle and progressive, which makes the phenomenon harder to detect and prevent. This is why watchfulness has to be collective.
Individual warning signals (watch for in yourself):
Difficulty formulating a position on a topic without having consulted AI first.
Defaulting to AI as your first interlocutor.
Needing an immediate response; discomfort with uncertainty or with sustained autonomous reflection.
The feeling that AI always confirms your intuitions, combined with noticing that no recent interaction has led you to revise a position or recognize an objection you hadn’t already identified.
Collective warning signals (watch for in the team):
AI invoked as an argument from authority in meetings (“I checked with Claude”, “the AI confirms that…”).
Deliverable quality is constant or improving, but the ability to defend the work orally is going down.
3. Protocol
When one or more warning signals appears:
If you see the signal in yourself: stop using AI on the task in question and pick it back up autonomously, then talk to a colleague or manager about it.
If you see it in a colleague: give them the feedback directly, in an informal setting.
If the pattern persists or affects more than one person: raise it at a team meeting.
These seem like valuable and sensible guidelines, and I support making them formal and available for public discussion. May even be helpful as a template for other orgs grappling with this issue
It’s good that people are posting these.
My policy, in the context of programming and writing, is that every word and every line of code must be written by me. I use AI only to learn details of specific stuff, as a kind of “smart book”, never to generate or edit the stuff I make. (For example, if I’m solving a CSS layout problem, I don’t describe the problem to AI and ask for the solution. Instead I ask it for details of how CSS flex works, cross-check with MDN docs, and solve the problem myself.) And no AI editing passes at the end either: the final generator of everything I make is me.
you see the emoji in the heading and the bolded bullet points, right? is the protocol formatted in the style of AI slop on purpose as a meta-warning?
That’s funny, because I added this one manually and wondered if it might sound like AI. I finally left it in, but if it trigger this vibe, I’ll remove it. Having an emoji is the by default when it comes from a Notion page And the default one is the lightbulb. Thanks for your comment.
yeah, linkedin slop predates AI slop (..it had to be trained on something, right ?🤷) and Notion is pure slop for me too (I suspect not by coincidence, but even if the causal chain was independent, it triggers my allergy against AI-slop-shaped artifacts nevertheless)
I can confirm it looks better without the emoji 🎉[1]
..I still think it would be better without the heading whatsoever, the 3 points stand on their own without double-bold, but that’s based on my desire to have formal documents as short as possible ruthlessly deleting every single word that doesn’t add value (given that length is anti-correlated with number of people who will read the full document), not slop.
sorry for the ironic joke, but I have hope people can judge the appropriateness of a tool for an informal comment vs formal document
I agree, I’ll take it into account, thanks for the comment!
I think there are some situations where it can be good to use AI for something like this. Once I noticed that me and a friend seemed to keep talking past each other and both were getting frustrated. With my friend’s permission, I gave Claude a copy of the conversation and asked it to diagnose what was happening and what the miscommunication was, and that was helpful.
Of course, sometimes a human can also play this role and help two people see where they’re talking past each other, but it requires being able to see both people’s frames and where they’re missing each other.
I think this makes a couple of assumptions. One, that frustration vented to an AI means that there will be no more discussion inside the team. Two, that everyone involved has enough social and communicative skill to make conversations inside the team go well without external assistance.
The second point may very well be true—I don’t know your team! But people are often able to express their frustrations more constructively to the person they’re frustrated with, once they’ve first gotten to vent them to an uninvolved third party. And the AI may also make you more receptive to the other person’s point of view, if it also offers a more charitable interpretation of their stance than you could have in your frustrated state.
Again, it’s possible that talking to an uninvolved member inside the team accomplishes the same. But I could imagine situations where e.g. the entire team has their own opinions on a decision and nobody is truly uninvolved in it, where it would be better for morale and cohesion to first talk it out with an AI.
Would this also apply to something where you have e.g. written a draft, asked the AI to criticize it, found that the AI was correct to criticize it, and are trying to rewrite the draft to address the criticism? I could easily see a loop of “draft, have AI criticize, redraft, have AI criticize, repeat until both you and AI are satisfied with the argument” being both beneficial (if the argument has obvious flaws, might as well fix them before using a human’s time on the draft) and easily taking more than 90 minutes at a time.
This is slightly ambiguous to me. I assume it’s meant to at least include “ask the AI how to think about a topic”, but I’m not sure if it also covers something like “dump a bunch of your initial half-formed thoughts to an AI to see if it suggests potential related topics and to use it in a rubber duck-like fashion, while asking it things like ‘I have heard that X is true, which would be relevant for my argument if so—can you find me recent relevant papers evaluating whether it is true’.”
To me, the former seems bad (unless you genuinely are totally unfamiliar with the domain in question, in which case it doesn’t seem obviously worse than asking a human for similar pointers), while the latter seems potentially fine.
Re: writing, outside of formulaic contexts like legal writing and high stakes professional communication, I think—and have heard others say—that AI assisted drafting tends to neuter the piece. Everything becomes hedged, and therefore boring, and the surface area for disagreement and engagement shrinks. I don’t know if people actually like reading or trying to respond to arguments that are built like a fortress? Maybe you have to be sloppy to avoid making slop.
I find also that AI assisted drafting distracts me from the original point I wanted to make. Like, looking at the finished product, it’s gone somewhere I didn’t mean it to? But a lack of drift resistance could just be a skill issue on my part.
(These can apply even when every edit is yours and you’re not just uncritically accepting most of the AI’s revisions.)
Depends on how exactly you draft it, I think. It’s not uncommon for me to do AI-assisted drafting where I end up basically ignoring 95% of everything that the AI said and just picking up a few sentences worth of particularly good ideas that it happened to suggest.
E.g. this article was drafted together with Claude and I don’t think it ended up overly hedged.
I think using AI as an arbiter is fine if both parties agree. If one party doesn’t agree and therefore looks unreasonable, that’s a feature. Check whether this document is the office-political immune response to that prospect. Recall prediction markets:
If you are resigned to or intend that immune response, say something, I might have an idea that makes everyone better off.
The “office-political immune response” frame doesn’t fit at all and if anything, this is a document type I’ve never seen anywhere else, most people don’t have this on their radar, and I think it can sound weird to many orgs outside the AI sphere. That’s part of why we wrote it up. So I’m honestly not sure what to make of your point, but fair enough for covering the case where it would have applied, it just doesn’t here.