Telepathy Is (Algorithmically) Easy

Thought-sharing is the easiest approach for intelligence amplification given appropriate hardware. The main risks are psychosis and dissociative symptoms from identity disruption.

I’m around 30% that an implanted group of 10 would actually manage a pivotal act, conditional on hardware being solved.

Speech and text are extremely inefficient. For example, math textbooks are routinely more than one page long.

This sucks! I want the entirety of human hard-science results to pass through my mind at least once. Someone learned each of those concepts, but they can’t just copy their Understanding to me.^[1]

Or perhaps they can?

If we can read and write enough neural state, then communication is a unusually friendly target for cognitive augmentation. Unlike most enhancements, it doesn’t need (non-hardware) neuroscience breakthroughs in about half of possible worlds from my perspective.

Humans are already exceptionally skilled at communication despite terrible bandwidth. By speaking while learning neuralese, we can use spoken language and feature engineering as training wheels to bootstrap telepathy.

(To be clear, I’m talking about hardware and software to pass carefully-translated brain activity between people. It’s not spooky.)

Groups of experts could then share deep understanding in minutes-to-days; I’d wager that, with help from a mathematician, I could understand most of modern algebraic topology in a week instead of a year.

This could go a few ways. We’ll start with the most pessimistic success case, which I estimate is the top ~55% of possibilities. (Most of the bottom ~45%, where we telepathy entirely fails, are worlds where bootstraps don’t scale.)

Say that we have absolutely no idea how to implement any algorithms which aren’t scientifically replicated as of mid-2026.

Neurotech labs already translate low-dimensional data for speech, movement, and audio-visual stimuli. So we take thousands of these decoders running at much higher resolution across brain surface, and start by training a model on stimuli from a VR headset and haptic suit.

Screenshot from 2026-06-15 13-00-24.png — Left: computational graph for feature-engineered bootstrapping of telepathy model write component. The system learns to convert stimuli into neural activations. Right: same, for reading states.

We have a basis. This can decode and re-encode simple stimuli. We now train the model to predict what text this person will write and speak in a few seconds given their current activations; this takes a good bit longer, probably a few months. If scaling broke, this is where I’d expect it to.

Same idea as earlier, but now the model must learn *anticipatory* signals. Text pulled from Wikipedia. Actual delays probably gradually ramp from 1 second to 10 or more seconds.

And now, we connect two people using a shared translator model^[2]. They’ve learned explicit “macros” so it’s a light application of will to send thoughts to the other person.

Screenshot from 2026-06-15 18-08-23.png — A simple intent-to-transmit decoder allows deliberate, controlled communication, though it’s still extremely lossy.

This goes pretty terribly at first. Very imprecise. We keep the signal gain low to reduce weird effects (particularly psychosis, which I’ll get to later).

The pair simply talk about interesting things together. As humans do, they begin to build stronger models of each other; neuralese becomes increasingly useful for refining communication.

The external verbal refinement loop. In this example, the sender (left) is transmitting the molecular structure of paraxanthine, which the receiver (right) interprets as caffeine.

After ~4 months of this, the pair now has much better bandwidth than unaided speech. It’s more efficient to share learned insights than to learn independently.

As earlier, but now the refinement loop has tightened, being mostly nonverbal. “Where am I wrong about this being the chemical I canonically associate with coffee?” → “This red atom is replaced with a hydrogen.”

In more extreme hypotheticals (the upper 10% by my estimate), after about a year, they’re better thought of as one entity than two. As typical brains split computation between hemispheres, so too the minds fluently delegate fractional thoughts.

Scaling the number of people gives nearly linear returns^[3]; we’d need router minds, but beyond that, scaling doesn’t have a hard limit.

Alright, what if we know the brain’s local learning algorithm and can do whatever extra cortical mass would do?

We could then train the translator much more efficiently; after pretraining to convert to a blurry common language, we run the translator at much higher learning rate to reduce local error.

As in, we make the translator convert messages into whatever each mind is asking for.

Thus we needn’t wait for the two humans to become fluent in neuralese. The translator can adapt much quicker than human minds. Bottlenecks here are mostly psychological.

In the case where we can dramatically improve memory consolidation?

Here as well, we can probably accelerate translator convergence. Unlike most cognition, I strongly suspect that cross-human neuralese benefits (accounting for resources used) from strategically written replay code;

person Q was thinking P and then said something which resulted in idea K

seems like it could be pretty effectively scaffolded with some custom-built tools.

Alright, but beaming stimuli into my mind sounds a lot like hallucinations! I don’t have agency over what I’m “thinking”.

This is a misnomer; in humanlike intelligences, “control” is the result of lots of local computations with no central deciding entity. Those processes would probably not implode if they merged into one monolithic entity, although “I” would be less well-defined.

But the process which calls itself a me will still be disrupted by this change, and we don’t want a crazy superintelligence pointed at human values.

So, at minimum, each person has control over sending and receiving neuralese.

Frequency-coded working memory gives a good inductive bias for message-passing. “Person X is thinking Y” goes on one channel, where “person X” and “Y” are flexibly-bound preexisting circuits.^[4]

Working memory might be very similar to FM radio, in which case members of a small group could “tune in” to others’ broadcast channels.

We’d probably also include a loss term in the translator for raw sensory and motor signals, since these cause the worst subjective loss-of-agency feelings (sensory / movement data is mostly irrelevant to communication anyway).

I’m around 75% confident that these combined approaches would prevent first-order hallucinatory and psychotic effects, and around 80% conditional on non-acute psychosis that we’d avoid second-order (learned, more chronic) psychosis.

To restate:

Bootstrap the decoder using cheap data like stimulation and writing/speech so that augmentees can communicate anything useful at all; we want it to at least be coherent signals they’re sending.
Augmentees talk, lots, for a long time, while simultaneously trying to send their thoughts through the neuralese channel.
Humans are pretty damn good at communication for having such trash bandwidth; so the augmentees get better at communicating much faster than we’d expect from performance on other tasks. There’s a tight feedback loop of “what’s the person actually saying?” which accelerates this much better than it would if they just worked on challenges together without speaking.
As this loop closes, it starts to close faster since they’re now thinking more than speaking at each other; feedback loops are nearly thought-speed.

Out of the four approaches I’ve covered, I’m most confident that neuralese/telepathy is tractable with sufficient hardware.

Which brings us to hardware!

^
This is one reason why bureaucracies aren’t even vaguely superintelligent entities, despite often being composed of many individually very smart people.
^
This architecture (CLIP) is used in multimodal embedding for some tasks like text-conditioned image diffusion and AI-guided molecular search.
^
By the time linearity is saturated, the group is decidedly a superintelligence.
^
Also note that, at group sizes where routing becomes a bottleneck, working memory items are probably the most interesting things to broadcast; they’ve been selected by the augmentee’s cognition to be most relevant to whatever’s happening.
^
For example, broadcast storms.