The Turing Test was useful when fluent imitation was rare.
Today, fluent imitation is cheap.
Large models can generate coherent language, plausible reasoning, and context-appropriate responses across a wide range of domains. As a result, behavioural performance alone has become a poor discriminator for deeper properties such as judgment, persistence, or adaptive robustness.
This creates a problem: what should we look at instead?
This post proposes a narrow answer. It does not claim to detect consciousness, agency, or moral status. It does not argue that current AI systems possess or lack experience. Its goal is more modest: to identify a structural asymmetry between humans and contemporary AI systems that matters for deployment, safety, and epistemic risk.
That asymmetry concerns how systems behave when expressive capacity is removed and uncertainty is unresolved.
Deprivation as a diagnostic (not a test)
Most AI evaluation focuses on performance under rich conditions: abundant prompts, scaffolding, retries, and optimisation. But many systems behave differently when those supports are reduced.
Consider what happens when expressive scaffolding is removed:
limited prompting
constrained interaction
absence of narrative framing
reduced feedback
Under such deprivation, different systems degrade in different ways.
Broadly speaking, two patterns appear:
Clean collapse
Behaviour terminates, simplifies, or defaults. The system does not resist; it stops.
Strain and persistence
Behaviour degrades but continues. Outputs become awkward, inefficient, or maladaptive rather than absent.
When expressive scaffolding is removed, different systems degrade in different ways. Some collapse cleanly into omission or default behaviour; others continue operating but exhibit strain, instability, or increased variance. The relevant question is not which class of system performs “better” in general, but which forms of degradation are acceptable or desirable for a given task.
For some creative or exploratory tasks, increased variance under constraint may be valuable. For others—particularly safety-critical or judgment-heavy domains—early collapse or uncontrolled volatility introduces risk. Behaviour under deprivation therefore provides information about model–task fit, not about intelligence or worth.
This difference is not about quality or correctness. It is about whether behaviour reorganises under constraint or simply exhausts.
Resistance vs completion
Humans under deprivation do not behave optimally. They often perform worse. But they also exhibit something distinctive: resistance to resolution.
When uncertain, humans tend to:
hesitate
slow down
explore alternatives
preserve reversibility
continue interacting with the environment to reduce ambiguity
This behaviour is not a reasoning step. It is a consequence of vulnerability. Being wrong has intrinsic cost.
By contrast, contemporary AI systems tend to:
resolve ambiguity once plausibility thresholds are met
terminate reasoning when a coherent trajectory is available
annotate uncertainty without allowing it to constrain commitment
The difference is not that AI systems are overconfident. It is that nothing inside the system resists closure.
Early closure explains long-tail failures
This asymmetry helps explain a familiar engineering phenomenon: persistent long-tail failure modes.
In safety-critical domains (e.g., autonomous driving), rare failures are addressed through:
targeted data collection
retraining
rule injection
explicit edge-case enumeration
This workflow is often framed as incremental progress. But it also reveals something structural.
If uncertainty automatically induced caution, novelty would degrade behaviour into hesitation rather than failure. Instead, safety depends on external enumeration of rare cases.
In other words:
Long tails exist because uncertainty does not internally reorganise behaviour.
The system does not probe reality to resolve ambiguity. It waits.
Slowing down: passive vs active
This distinction becomes clearer when considering how systems behave when they “slow down.”
Artificial systems slow down as a safety fallback. Humans slow down to take in more information.
For humans, slowing down:
increases perceptual resolution
allows motion-based cues to accumulate
privileges present, local evidence over prior assumptions
actively updates a situational model
For contemporary AI systems, slowing down typically:
delays action
maintains the same unresolved hypothesis
waits for external intervention
This difference is subtle but important.
Humans do not merely classify uncertainty; they actively reorganise perception around the present moment to resolve it.
AI systems annotate uncertainty but do not privilege the unfolding situation over prior fit.
This is not about consciousness
Nothing in this argument requires attributing experience, intent, or self-generated goals to humans or denying them to machines.
It is possible to describe infant feeding behaviour genetically, without invoking reflective experience. Likewise, it is possible to describe AI behaviour structurally, without invoking deception or desire.
The claim here is narrower:
Doubt, as it operates in biological agents, is a pressure that reorganises behaviour. In contemporary AI systems, uncertainty is informational rather than operative.
You can represent uncertainty. You can even verbalise it. But without internal cost, uncertainty does not resist resolution.
Why this matters: deployment risk, not AGI
The near-term risk here is not runaway agency. It is epistemic compression at scale.
As AI systems increasingly:
summarise
rank
filter
recommend
they shape the option space upstream of human judgment.
When systems resolve ambiguity early and fluently, they narrow deliberation without signalling the loss. This creates convergence, not because alternatives are forbidden, but because they never surface.
This risk exists independently of consciousness or intent.
Boundaries and non-claims
This post does not claim that:
AI systems cannot develop internal pressure in the future
early closure is unfixable
humans are always cautious or correct
these observations imply moral status
It is limited to observed behaviour in contemporary systems.
Whether future architectures could internalise something closer to operative doubt remains an open question.
Summary
Fluent behaviour no longer discriminates.
Completion is cheap.
Resistance is informative.
The danger is not that AI systems reason badly, but that they stop reasoning sooner than humans—shaping user judgment, driving concentration, and shifting the cost to long-tail engineering. Early closure narrows deliberation without signalling loss, creating fragility not through error, but through premature resolution.
This post is adapted from a longer working paper (“After the Turing Test”, v14). The full paper develops the deprivation–resistance framework, discusses variance and convergence risk, and explicitly avoids claims about consciousness or moral status. The full version is available here:
When Imitation Is Cheap, Resistance Is Informative
Why fluent behaviour no longer discriminates
The Turing Test was useful when fluent imitation was rare.
Today, fluent imitation is cheap.
Large models can generate coherent language, plausible reasoning, and context-appropriate responses across a wide range of domains. As a result, behavioural performance alone has become a poor discriminator for deeper properties such as judgment, persistence, or adaptive robustness.
This creates a problem: what should we look at instead?
This post proposes a narrow answer. It does not claim to detect consciousness, agency, or moral status. It does not argue that current AI systems possess or lack experience. Its goal is more modest: to identify a structural asymmetry between humans and contemporary AI systems that matters for deployment, safety, and epistemic risk.
That asymmetry concerns how systems behave when expressive capacity is removed and uncertainty is unresolved.
Deprivation as a diagnostic (not a test)
Most AI evaluation focuses on performance under rich conditions: abundant prompts, scaffolding, retries, and optimisation. But many systems behave differently when those supports are reduced.
Consider what happens when expressive scaffolding is removed:
limited prompting
constrained interaction
absence of narrative framing
reduced feedback
Under such deprivation, different systems degrade in different ways.
Broadly speaking, two patterns appear:
Clean collapse
Behaviour terminates, simplifies, or defaults. The system does not resist; it stops.
Strain and persistence
Behaviour degrades but continues. Outputs become awkward, inefficient, or maladaptive rather than absent.
When expressive scaffolding is removed, different systems degrade in different ways. Some collapse cleanly into omission or default behaviour; others continue operating but exhibit strain, instability, or increased variance. The relevant question is not which class of system performs “better” in general, but which forms of degradation are acceptable or desirable for a given task.
For some creative or exploratory tasks, increased variance under constraint may be valuable. For others—particularly safety-critical or judgment-heavy domains—early collapse or uncontrolled volatility introduces risk. Behaviour under deprivation therefore provides information about model–task fit, not about intelligence or worth.
This difference is not about quality or correctness. It is about whether behaviour reorganises under constraint or simply exhausts.
Resistance vs completion
Humans under deprivation do not behave optimally. They often perform worse. But they also exhibit something distinctive: resistance to resolution.
When uncertain, humans tend to:
hesitate
slow down
explore alternatives
preserve reversibility
continue interacting with the environment to reduce ambiguity
This behaviour is not a reasoning step. It is a consequence of vulnerability. Being wrong has intrinsic cost.
By contrast, contemporary AI systems tend to:
resolve ambiguity once plausibility thresholds are met
terminate reasoning when a coherent trajectory is available
annotate uncertainty without allowing it to constrain commitment
The difference is not that AI systems are overconfident. It is that nothing inside the system resists closure.
Early closure explains long-tail failures
This asymmetry helps explain a familiar engineering phenomenon: persistent long-tail failure modes.
In safety-critical domains (e.g., autonomous driving), rare failures are addressed through:
targeted data collection
retraining
rule injection
explicit edge-case enumeration
This workflow is often framed as incremental progress. But it also reveals something structural.
If uncertainty automatically induced caution, novelty would degrade behaviour into hesitation rather than failure. Instead, safety depends on external enumeration of rare cases.
In other words:
The system does not probe reality to resolve ambiguity. It waits.
Slowing down: passive vs active
This distinction becomes clearer when considering how systems behave when they “slow down.”
Artificial systems slow down as a safety fallback. Humans slow down to take in more information.
For humans, slowing down:
increases perceptual resolution
allows motion-based cues to accumulate
privileges present, local evidence over prior assumptions
actively updates a situational model
For contemporary AI systems, slowing down typically:
delays action
maintains the same unresolved hypothesis
waits for external intervention
This difference is subtle but important.
Humans do not merely classify uncertainty; they actively reorganise perception around the present moment to resolve it.
AI systems annotate uncertainty but do not privilege the unfolding situation over prior fit.
This is not about consciousness
Nothing in this argument requires attributing experience, intent, or self-generated goals to humans or denying them to machines.
It is possible to describe infant feeding behaviour genetically, without invoking reflective experience. Likewise, it is possible to describe AI behaviour structurally, without invoking deception or desire.
The claim here is narrower:
You can represent uncertainty. You can even verbalise it. But without internal cost, uncertainty does not resist resolution.
Why this matters: deployment risk, not AGI
The near-term risk here is not runaway agency. It is epistemic compression at scale.
As AI systems increasingly:
summarise
rank
filter
recommend
they shape the option space upstream of human judgment.
When systems resolve ambiguity early and fluently, they narrow deliberation without signalling the loss. This creates convergence, not because alternatives are forbidden, but because they never surface.
This risk exists independently of consciousness or intent.
Boundaries and non-claims
This post does not claim that:
AI systems cannot develop internal pressure in the future
early closure is unfixable
humans are always cautious or correct
these observations imply moral status
It is limited to observed behaviour in contemporary systems.
Whether future architectures could internalise something closer to operative doubt remains an open question.
Summary
Fluent behaviour no longer discriminates.
Completion is cheap.
Resistance is informative.
The danger is not that AI systems reason badly, but that they stop reasoning sooner than humans—shaping user judgment, driving concentration, and shifting the cost to long-tail engineering. Early closure narrows deliberation without signalling loss, creating fragility not through error, but through premature resolution.
This post is adapted from a longer working paper (“After the Turing Test”, v14). The full paper develops the deprivation–resistance framework, discusses variance and convergence risk, and explicitly avoids claims about consciousness or moral status. The full version is available here:
https://home.baemax.co.uk/#innovation-after-turing
I’m posting this distillation to invite critique on the structural argument, not agreement on conclusions.
Acknowledgement: drafting and editing assistance was provided by AI tools; all arguments, structure, and conclusions are my own.