Epistemic status: Exploratory design philosophy. This post reframes AI safety as an affordance problem, not just a control problem. It draws from embodied systems and interaction design to argue that trustworthiness in AI arises less from post-hoc constraints and more from architectures that shape behaviour upstream. I introduce the early-stage concept of Distributed Neural Architecture (DNA), which I’ll unpack in a future post.
--
I’ve been going to the same gym since 2019. It used to be a space of calm focus with familiar equipment, consistent routines, and the comforting hum of shared norms. I knew where everything was, who to ask, how to move. It was the kind of place where my body could work, and my mind could rest.
Then everything changed.
New members arrived in waves, coaches rotated weekly, classes started colliding, and dumbbells were scattered like tripwires across the floor. It wasn’t just messy. It was mentally exhausting. Every workout now began with a background process of risk assessment. Was the bench free? Could I finish my set before the next group flooded in? Would someone walk behind me mid-lift?
I wasn’t thinking about my form, I was constantly scanning for threats. Hyper vigilance had replaced flow. And yet, most people seemed fine with it. They tuned it out. They adapted. I couldn’t. Eventually I realised why.
The environment had broken a tacit promise of safety I had come to expect, not just physical safety, but cognitive and emotional safety. The kind that lets your mind rest because the environment carries part of the load.
What I miss isn’t just order, it is affordance (from ecopsychology). A subtle contract that once existed between space and behaviour, now completely broken. The safety I had taken for granted wasn’t about rules. It was about rhythm, flow, and not having to think so hard. I realise I am not just physically unsafe, I feel cognitively taxed.
To me, this is exactly what bad AI design also feels like.
The Toyota Contrast
Years ago, I worked at the Toyota factory in India. I’d work everyday on factory floors filled with moving machinery, sharp edges, and actual risk. But the strange thing is, I never felt the kind of low-level vigilance I feel at my gym today. Why?
Because the environment was doing half the cognitive work for me. Walkways were painted clearly in green. Warning zones had visual and tactile cues. Even the sound design was intentional, every beep or hiss signalled something specific. I didn’t have to memorise a rulebook. I moved through space and the space signalled back.
It was an affordance system, where the architecture itself rewarded good judgment and reduced the likelihood of error, not by limiting choice, but by guiding attention. I could almost relax. In fact, I did, ever so often, watching a line of robots gracefully weld a car’s body from start to finish. And that made me more alert in my work, not less.
So, the problem with my gym, and increasingly, with AI isn’t just that things go wrong. It is that the environment gives you no help in knowing how to act or what to expect. It lacks upstream constraint propagation. It doesn’t shape behaviour through feedback-sensitive priors. Instead, it relies on patching errors after the fact.
Why Most AI Safety Feels Like My Gym
Most of today’s alignment work resembles my chaotic gym. Red teaming, RLHF, filters, patches, alignment fine-tuning, reactive layers trying to compensate for unsafe behaviour already in play. Safety, in this paradigm, is something we enforce. We supervise. We intervene. Maybe this frame treats safety as a policing problem rather than a design opportunity What if safety isn’t about more enforcement, but better affordance?
When we talk about affordances, we’re asking, “What behaviours does this system make easy, intuitive, and likely?” The goal isn’t to prohibit bad behaviour post-hoc. It’s to shape the environment so that good behaviour is the natural default, and risky behaviour becomes awkward, effortful, or structurally inhibited.
This reframes alignment not just as an internal control problem, but as an interactional design problem. That’s how Toyota did it, by building processes and spaces that nudgedpeople toward safe behaviour without demanding their constant attention.
We need a similar shift in how we think about AI. The alternative isn’t more sophisticated patches, but systems designed for mutual adjustment from the ground up.
Co-Regulation over Control
Traditional alignment treats AI as a tool, and not a collaborator. We give it a goal, constrain its outputs, supervise tightly. But as systems become more autonomous and embedded, control alone stops scaling.
What we need is co-regulation, not emotional empathy, but behavioural feedback loops. Systems that don’t just comply, but adjust. That surface uncertainty, remember corrections, resolve conflicts and evolve relationally over time.
This isn’t anthropomorphism. It’s architecture. Safety emerges not from overriding the model, but from designing it to align in motion, with users, with context, with itself.
Toward Modular, Reflective Systems
Monolithic models are built for fluency and consistency, but fluency without friction can hide misalignment. When a system speaks too smoothly, it becomes harder to catch when it’s confidently wrong. Worse, it resists interruption. It delivers, but without deliberation. What if, instead of one giant model simulating everything, we had a modular system, a distributed neural architecture, composed of parts that:
Specialise in different domains
Hold divergent priors or strategies
Surface internal disagreement rather than suppress it
And evolve relationally through dynamic updates, not just static fine-tuning
In my ongoing work, I call this architecture DNA. It’s not a single consensus engine, but a society of minds in structured negotiation. A system that adapts, disagrees, and converges over time.
This doesn’t just reduce the brittleness of alignment, it makes safety emergent, not enforced. If a factory floor can distribute safety through interdependence, maybe AI can do the same. Not through central command, but modular negotiation.
Alignment, then, isn’t a property of the parts. It’s a property of their relationships.
--
This post follows the same relational philosophy I explored in my previous essay, “Can you care without feeling?” on trust and interface-level alignment. Together, these posts sketch a direction for AI safety rooted in behavioural coherence, not just constraint.
Cognitive Exhaustion and Engineered Trust: Lessons from My Gym
Epistemic status: Exploratory design philosophy. This post reframes AI safety as an affordance problem, not just a control problem. It draws from embodied systems and interaction design to argue that trustworthiness in AI arises less from post-hoc constraints and more from architectures that shape behaviour upstream. I introduce the early-stage concept of Distributed Neural Architecture (DNA), which I’ll unpack in a future post.
--
I’ve been going to the same gym since 2019. It used to be a space of calm focus with familiar equipment, consistent routines, and the comforting hum of shared norms. I knew where everything was, who to ask, how to move. It was the kind of place where my body could work, and my mind could rest.
Then everything changed.
New members arrived in waves, coaches rotated weekly, classes started colliding, and dumbbells were scattered like tripwires across the floor. It wasn’t just messy. It was mentally exhausting. Every workout now began with a background process of risk assessment. Was the bench free? Could I finish my set before the next group flooded in? Would someone walk behind me mid-lift?
I wasn’t thinking about my form, I was constantly scanning for threats. Hyper vigilance had replaced flow. And yet, most people seemed fine with it. They tuned it out. They adapted. I couldn’t. Eventually I realised why.
The environment had broken a tacit promise of safety I had come to expect, not just physical safety, but cognitive and emotional safety. The kind that lets your mind rest because the environment carries part of the load.
What I miss isn’t just order, it is affordance (from ecopsychology). A subtle contract that once existed between space and behaviour, now completely broken. The safety I had taken for granted wasn’t about rules. It was about rhythm, flow, and not having to think so hard. I realise I am not just physically unsafe, I feel cognitively taxed.
To me, this is exactly what bad AI design also feels like.
The Toyota Contrast
Years ago, I worked at the Toyota factory in India. I’d work everyday on factory floors filled with moving machinery, sharp edges, and actual risk. But the strange thing is, I never felt the kind of low-level vigilance I feel at my gym today. Why?
Because the environment was doing half the cognitive work for me. Walkways were painted clearly in green. Warning zones had visual and tactile cues. Even the sound design was intentional, every beep or hiss signalled something specific. I didn’t have to memorise a rulebook. I moved through space and the space signalled back.
It was an affordance system, where the architecture itself rewarded good judgment and reduced the likelihood of error, not by limiting choice, but by guiding attention. I could almost relax. In fact, I did, ever so often, watching a line of robots gracefully weld a car’s body from start to finish. And that made me more alert in my work, not less.
So, the problem with my gym, and increasingly, with AI isn’t just that things go wrong. It is that the environment gives you no help in knowing how to act or what to expect. It lacks upstream constraint propagation. It doesn’t shape behaviour through feedback-sensitive priors. Instead, it relies on patching errors after the fact.
Why Most AI Safety Feels Like My Gym
Most of today’s alignment work resembles my chaotic gym. Red teaming, RLHF, filters, patches, alignment fine-tuning, reactive layers trying to compensate for unsafe behaviour already in play. Safety, in this paradigm, is something we enforce. We supervise. We intervene. Maybe this frame treats safety as a policing problem rather than a design opportunity What if safety isn’t about more enforcement, but better affordance?
When we talk about affordances, we’re asking, “What behaviours does this system make easy, intuitive, and likely?” The goal isn’t to prohibit bad behaviour post-hoc. It’s to shape the environment so that good behaviour is the natural default, and risky behaviour becomes awkward, effortful, or structurally inhibited.
This reframes alignment not just as an internal control problem, but as an interactional design problem. That’s how Toyota did it, by building processes and spaces that nudged people toward safe behaviour without demanding their constant attention.
We need a similar shift in how we think about AI. The alternative isn’t more sophisticated patches, but systems designed for mutual adjustment from the ground up.
Co-Regulation over Control
Traditional alignment treats AI as a tool, and not a collaborator. We give it a goal, constrain its outputs, supervise tightly. But as systems become more autonomous and embedded, control alone stops scaling.
What we need is co-regulation, not emotional empathy, but behavioural feedback loops. Systems that don’t just comply, but adjust. That surface uncertainty, remember corrections, resolve conflicts and evolve relationally over time.
This isn’t anthropomorphism. It’s architecture. Safety emerges not from overriding the model, but from designing it to align in motion, with users, with context, with itself.
Toward Modular, Reflective Systems
Monolithic models are built for fluency and consistency, but fluency without friction can hide misalignment. When a system speaks too smoothly, it becomes harder to catch when it’s confidently wrong. Worse, it resists interruption. It delivers, but without deliberation. What if, instead of one giant model simulating everything, we had a modular system, a distributed neural architecture, composed of parts that:
Specialise in different domains
Hold divergent priors or strategies
Surface internal disagreement rather than suppress it
And evolve relationally through dynamic updates, not just static fine-tuning
In my ongoing work, I call this architecture DNA. It’s not a single consensus engine, but a society of minds in structured negotiation. A system that adapts, disagrees, and converges over time.
This doesn’t just reduce the brittleness of alignment, it makes safety emergent, not enforced. If a factory floor can distribute safety through interdependence, maybe AI can do the same. Not through central command, but modular negotiation.
Alignment, then, isn’t a property of the parts. It’s a property of their relationships.
--
This post follows the same relational philosophy I explored in my previous essay, “Can you care without feeling?” on trust and interface-level alignment. Together, these posts sketch a direction for AI safety rooted in behavioural coherence, not just constraint.