Cognitive Exhaustion and Engineered Trust: Lessons from My Gym
Epistemic status: Exploratory design philosophy. This post reframes AI safety as an affordance problem, not just a control problem. It draws from embodied systems and interaction design to argue that trustworthiness in AI arises less from post-hoc constraints and more from architectures that shape behaviour upstream. I introduce the early-stage concept of Distributed Neural Architecture (DNA), which I’ll unpack in a future post.
--
I’ve been going to the same gym since 2019. It used to be a space of calm focus with familiar equipment, consistent routines, and the comforting hum of shared norms. I knew where everything was, who to ask, how to move. It was the kind of place where my body could work, and my mind could rest.
Then everything changed.
New members arrived in waves, coaches rotated weekly, classes started colliding, and dumbbells were scattered like tripwires across the floor. It wasn’t just messy. It was mentally exhausting. Every workout now began with a background process of risk assessment. Was the bench free? Could I finish my set before the next group flooded in? Would someone walk behind me mid-lift?
I wasn’t thinking about my form, I was constantly scanning for threats. Hyper vigilance had replaced flow. And yet, most people seemed fine with it. They tuned it out. They adapted. I couldn’t. Eventually I realised why.
The environment had broken a tacit promise of safety I had come to expect, not just physical safety, but cognitive and emotional safety. The kind that lets your mind rest because the environment carries part of the load.
What I miss isn’t just order, it is affordance (from ecopsychology). A subtle contract that once existed between space and behaviour, now completely broken. The safety I had taken for granted wasn’t about rules. It was about rhythm, flow, and not having to think so hard. I realise I am not just physically unsafe, I feel cognitively taxed.
To me, this is exactly what bad AI design also feels like.
The Toyota Contrast
Years ago, I worked at the Toyota factory in India. I’d work everyday on factory floors filled with moving machinery, sharp edges, and actual risk. But the strange thing is, I never felt the kind of low-level vigilance I feel at my gym today. Why?
Because the environment was doing half the cognitive work for me. Walkways were painted clearly in green. Warning zones had visual and tactile cues. Even the sound design was intentional, every beep or hiss signalled something specific. I didn’t have to memorise a rulebook. I moved through space and the space signalled back.
It was an affordance system, where the architecture itself rewarded good judgment and reduced the likelihood of error, not by limiting choice, but by guiding attention. I could almost relax. In fact, I did, ever so often, watching a line of robots gracefully weld a car’s body from start to finish. And that made me more alert in my work, not less.
So, the problem with my gym, and increasingly, with AI isn’t just that things go wrong. It is that the environment gives you no help in knowing how to act or what to expect. It lacks upstream constraint propagation. It doesn’t shape behaviour through feedback-sensitive priors. Instead, it relies on patching errors after the fact.
Why Most AI Safety Feels Like My Gym
Most of today’s alignment work resembles my chaotic gym. Red teaming, RLHF, filters, patches, alignment fine-tuning, reactive layers trying to compensate for unsafe behaviour already in play. Safety, in this paradigm, is something we enforce. We supervise. We intervene. Maybe this frame treats safety as a policing problem rather than a design opportunity What if safety isn’t about more enforcement, but better affordance?
When we talk about affordances, we’re asking, “What behaviours does this system make easy, intuitive, and likely?” The goal isn’t to prohibit bad behaviour post-hoc. It’s to shape the environment so that good behaviour is the natural default, and risky behaviour becomes awkward, effortful, or structurally inhibited.
This reframes alignment not just as an internal control problem, but as an interactional design problem. That’s how Toyota did it, by building processes and spaces that nudged people toward safe behaviour without demanding their constant attention.
We need a similar shift in how we think about AI. The alternative isn’t more sophisticated patches, but systems designed for mutual adjustment from the ground up.
Co-Regulation over Control
Traditional alignment treats AI as a tool, and not a collaborator. We give it a goal, constrain its outputs, supervise tightly. But as systems become more autonomous and embedded, control alone stops scaling.
What we need is co-regulation, not emotional empathy, but behavioural feedback loops. Systems that don’t just comply, but adjust. That surface uncertainty, remember corrections, resolve conflicts and evolve relationally over time.
This isn’t anthropomorphism. It’s architecture. Safety emerges not from overriding the model, but from designing it to align in motion, with users, with context, with itself.
Toward Modular, Reflective Systems
Monolithic models are built for fluency and consistency, but fluency without friction can hide misalignment. When a system speaks too smoothly, it becomes harder to catch when it’s confidently wrong. Worse, it resists interruption. It delivers, but without deliberation. What if, instead of one giant model simulating everything, we had a modular system, a distributed neural architecture, composed of parts that:
Specialise in different domains
Hold divergent priors or strategies
Surface internal disagreement rather than suppress it
And evolve relationally through dynamic updates, not just static fine-tuning
In my ongoing work, I call this architecture DNA. It’s not a single consensus engine, but a society of minds in structured negotiation. A system that adapts, disagrees, and converges over time.
This doesn’t just reduce the brittleness of alignment, it makes safety emergent, not enforced. If a factory floor can distribute safety through interdependence, maybe AI can do the same. Not through central command, but modular negotiation.
Alignment, then, isn’t a property of the parts. It’s a property of their relationships.
--
This post follows the same relational philosophy I explored in my previous essay, “Can you care without feeling?” on trust and interface-level alignment. Together, these posts sketch a direction for AI safety rooted in behavioural coherence, not just constraint.
Well-written, and good points. I hope and pray that we get to this point in AT alignment. However, I think it might be wise to first make very certain that AI isn’t going to kill everyone, before progressing to improving its cognitive affordances.
Thank you! Survival risk matters, but I’m more focused on systems that don’t need to be controlled to behave safely. Beyond malice, I believe most failures are a result of misalignment under stress.
the part about the gym really resonates with me, i personally find it almost impossible to focus when people are around and the environment isn’t stable enough.
but i have to push back a little on your alignment idea (assuming i didn’t misunderstand it), you still have to deal with the corrigibility, if an AI has a different utility function from the system as a whole it will try to resist having its utility altered and depending on how powerful it is it might just take over the system all together
the idea of having multiple different systems monitoring and steering each other with the goal of making alignment naturally occur would require you to predict in advance the final equilibrium and for that equilibrium to be favorable, for a system this complicated there is just to many failure points to consider.
for all you know the system might just settle on gaming the reward function, maybe with one or a few parts of the system circumventing all the safeguards.
i think your idea might work for subhuman or maybe early AGI systems, but once the AI’s figure out what system they are in and in what way does it contradict their own utility, you will have a very hard time keeping them in check.
also you should change the name, DNA is a terrible name.