It is quite high:
The current thinking is that although around 1.5 to 3.5% of people will meet diagnostic criteria for a psychotic disorder, a significantly larger, variable number will experience at least one psychotic symptom in their lifetime.
The cited study is a survey of 7076 people in the Netherlands, which mentions:
1.5% sample prevalence of “any DSM-III-R diagnosis of psychotic disorder”
4.2% sample prevalence of “any rating of hallucinations and/or delusions”
17.5% sample prevalence of “any rating of psychotic or psychosislike symptoms”
That’s definitely a concern. But even if the AI fully understands that it isn’t in its best interest to explicitly write out its misaligned plans, it’s been trained to think in whatever way is most effective at gaining reward. Hopefully that means writing its plans clearly. Even if it sometimes succeeds at keeping its thoughts hidden, it’s likely to slip up.
By analogy, it’s likely difficult for you to think through a deceptive plan without saying suspicious things in your internal “chain of thought.”