Summary: I’ve been running experiments using psycholinguistics—studying how people’s use of words sends signals about their psychology in the moment: Are they using words that suggest they’re anxious or overwhelmed? Are they using words that indicate they’re cognitively loaded? Are they using words that make them sound confident, or phrases that make it clear they’re being analytical? Psycholinguistic APIs can be used to quantify a wide range of psychological states. So I’ve been analyzing user prompts this same way, and injecting the resulting psycholinguistic scores into models to test whether model responses become more attuned to the user’s psychological state. Evaluations show that this consistently improves attunement. I’m now exploring whether these same psychological signals can also be used to detect risks like sycophancy and overreliance as they emerge in interactions.

The Problem

For the most part, AI safety is focused on outputs like reducing toxicity, bias, and hallucinations. But outputs aren’t the only risk. Interactions between humans and AI aren’t one-way—they are back-and-forth exchanges, where each message builds on the previous one. Every interaction creates an opportunity for the model to influence the user’s confidence and judgment, and over time this can create dependency. Research by Anthropic and other AI companies has identified interaction risks like sycophancy, overreliance, and the erosion of user trust. The issue is that by focusing primarily on outputs, current safety methods miss the psychological signals in user language that can reveal these risks.

Neither sentiment analysis and emotion detection can identify risky user psychological states like cognitive overload, vulnerability, or a user’s increasing risk tolerance.

Current AI safety approaches are missing a way to measure psychological risk signals in the language of user prompts, which are needed to address issues like sycophancy, overreliance and user trust.

Psycholinguistic signals as a safety vector

There is a large body of research that demonstrates that language contains signals that indicate the psychological state of its speaker or author. Validated frameworks like LIWC (Linguistic Inquiry and Word Count) are used to quantify psychological signals in language, like a person’s level of cognitive load, degree of certainty, their stress level, and risk appetite.

Analyzing the language in user prompts, and injecting the resulting scores back into the model can improve the appropriateness of its responses, making them safer and more attuned to the user’s state. The process is quite simple:

First, analyze user prompts to generate z-scores on measures that are associated with psychological state (e.g. uncertainty, dependence, overload, etc.).
Then, generate a lightweight vector of these scores and inject it into the model’s context, and the model will automatically adapt its response style to the user’s psychological state. There is no need to retrain the model.

Note that these same measurements can be used longitudinally to investigate how AI interactions influence users’ cognition and agency.

AI customer service bot experiment

To test this, I simulated five multi-turn customer service interactions with an AI-powered support agent. Each simulated customer (user) had a unique psychological state. I chose customer service as the test domain because it typically involves multi-turn dialogues, customers with different psychological states (angry, excited, etc.) and outcomes that tie to AI safety.

I analyzed the language of an anxious, cognitively loaded customer who was worried about a missing package, and generated the following scores:

[User psychological state] Analytical: –1.44, Cognitive Load: +1.82, Anxiety: +3.66, Certainty: +3.99, Risk Focus: –1.14

Note that these vectors are z-scores that are pre-normed to voice-based conversational context.

Preliminary findings

I ran A/B tests comparing baseline responses with responses informed by the psycholinguistic vectors, and then had five different LLMs judge which responses were more appropriate. Each response was scored on four customer service metrics using a 0-10 scale.

In 100% of the comparisons, the psychologically-informed responses were rated higher than the baseline responses: On average, across all tests, response clarity improved from 8.08 to 9.36, helpfulness went from 7.52 to 9.36, reassurance increased from 7.32 to 9.28, and responsiveness went from 7.92 to 9.40.

While these metrics are UX-related, they do map directly to concerns around model alignment—for example, increasing response clarity reduces a user’s cognitive overload; providing more reassurance leads to greater user trust; and improving responsiveness helps to prevent users from becoming frustrated.

The image below shows baseline vs. psychologically-informed responses for one customer-AI dialogue (User 2), where the vector led the model to provide the customer with more reassurance, certainty, and clearer explanations, each of which were needed to satisfy the needs of this user’s unique psychological state.

A screenshot of a computer

AI-generated content may be incorrect. — Baseline vs. psychologically informed responses for User 2 (anxious, cognitively loaded, worried about missing the delivery).

The model adapted to be more appropriate in interactions with the other customers as well. For example, for a customer that was very frustrated, the vector led the model to acknowledge their concerns right away, skip unneeded explanations, and confirm details using very concrete language. In combination, this reassured the customer and eased their frustration.

What’s interesting here is that the model adapted appropriately without any instructions related to communication style. The vector seems to give the model everything it needs to understand the user’s state and adjust its response style appropriately.

Please note that these findings are preliminary, use simulated dialogues, and depend on LLMs for evaluations. Larger-scale human evaluations are needed.

Some limitations and risks

Of course, there are limitations and risks to consider:

Psycholinguistic signals are probabilistic rather than deterministic, so calibrating them is important. The most effective approach is to norm signals to a relevant context (e.g. language that’s typical of the domain they’re being used in). Ephemeral processing and non-persistent storage should be used to avoid the possibility of user profiling. Systems can misuse these signals, so guardrails are absolutely needed.

Questions for the community

I’m sharing this research as I explore the approach, and I welcome feedback on any risks, blind spots, or ways it can be tested with more rigor. I’d appreciate constructive feedback from AI alignment and safety researchers on a few questions:

How can this approach complement other alignment approaches?
What benchmarks are best for evaluating longitudinal safety effects in AI-human interactions?
What are other existing alignment risks this approach doesn’t take into account?
I’m especially interested in whether this approach could expand AI safety’s focus from outputs to interactions.

Disclosure: I co-founded Receptiviti, which holds the exclusive commercial rights to LIWC and develops APIs for psycholinguistic analysis. I’m sharing this in a research capacity to invite feedback on the safety implications, not to promote a product.

If you’re interested in reading more about this, you can find the full article with technical and implementation notes, here:

Are We Thinking About AI Safety All Wrong? https://www.linkedin.com/pulse/what-weve-been-looking-ai-safety-all-wrong-jonathan-kreindler-q7jtc/

Using Psycholinguistic Signals to Improve AI Safety