Accelerated Skill Learning via Dream Engineering and Biofeedback

Konkoly et al. had participants play harmonicas by inhaling through their noses, with said instruments on their nostrils. Then everyone competitively blew bubbles.

Each of the two tasks were paired with a sound, and participants rehearsed mentally re-entering the task when the respective sound played.

During REM sleep, the researchers replayed one task’s sound cue. The sound biased which dream occurred.

So we can influence the content of dreams just by stimulating whatever happened while awake. Even for non-dreaming sleep, we can engineer which memories are consolidated and thereby improve memory.

This is all rather crude though. It’s one or a few memories we’re reactivating indirectly through external stimulation. Direct neural stimulation can have on the order of a hundred thousand times as many degrees of freedom; how far can we improve dream engineering?

During waking activity, mammalian brains convert neuron activity into engrams, hippocampal neurons which store individual memories. Engrams encode very narrow, precise memories, like rough snapshots of sensations and thoughts.

If you go to coffee with a friend, you’ll probably have an engram binding together the fuzzy feelings of her presence, condensation on the cafe windows, texture of the table’s wood grain, coffee taste, whatever you two discussed, etc.

Then, goes the theory, engram replay progressively generalizes the content of memories into less precise but more applicable representations.

Lots of the detail has dropped, but now your friend’s face is vaguely associated with joy and calm, clear, cool mornings.

Through this process, episodic memory is converted into useful skills/intuition, of the same sort we care about for making superintelligent humans. Accelerating and curating this process could plausibly dilate a bottleneck.

(I’m not all that sure this is a core bottleneck, but it seems possible. Like, there’s almost certainly an effect on my model, but it may be small. I do expect on median that it’s closer to 3-20x skill acquisition consolidation rate, with much of the usefulness concentrated in metacognitive skills which brains naturally deprioritize from replay.)

The original theory of engram consolidation was that engrams straightforwardly indexed time-coded causal implications of the waking environment. Like, if a mouse recently ran through a maze, it replays the maze-running neural activity straight forward. Just the same patterns, played faster.

But nowadays we know it’s less simple. Replay goes in non-straightforward directions; sometimes it plays backwards, or with interjections; well out-of-order compared to waking.

It seems that replay runs internal simulations; this is probably some form of credit assignment. Without understanding how simulations are derived from memories^[1], we’d be stuck replaying nominal memories in forward/backward-only order. This seems probably bad, since we’d interrupt natural simulations with rigidly-ordered reactivations.

I do however think that we don’t need to replay the full activation patterns as they occurred during waking.

In other words, we needn’t stimulate the entire engram contraption; just reactivating the straightforward memory lets the brain naturally sort out whatever timing elaborations it wants, while we simply provide the message “these concepts are somehow related, remember them?”.

The natural neural machinery carries away our rigid re-stimulation and does whatever funky stuff it wants; we’ve provided an interesting suggestion, is all.

Items held in working memory seem like a natural first start; they’re probably easy to decode and have been filtered by attention to be a maximally useful summary of whatever you’re processing.

Why might engram replay be bottlenecked?^[2]

I don’t think “memory volatility → lossy replay” is quite right; I’d expect biological brains to suck the most at long-term credit assignment since it’s brutalized by memory volatility, so “this thing I was thinking about 4 hours ago should have caused me to realize X” doesn’t naturally stick.

We could probably improve a lot on the accuracy and precision of long-term credit assignment by, for example, giving augmentees a way to fluently mark^[3] something in a working memory cache as salient to a recent insight.

I think humans suck at most thinking tasks like math, compared to entities which have specialized math-consolidation machinery^[4] running on the same simulation architecture.

Especially metacognitive stuff like “here’s the mental posture I want to take to reduce confirmation bias”. I don’t think there was ever a pressure to deliberately change how we oriented to stuff in the ancestral environment. Humans subliminally learned mental poses. That was all.

A team of assistants could monitor an augmentee and prioritize which moments / sessions get consolidated to maximize upskilling for rationality, political coordination, alignment research, etc.

Similarly, as mentioned here and by Eliezer, we could use biofeedback to directly reinforce rational thoughts, with a “metacognitive” activation probe checking when thoughts are e.g. confirmation-biased^[5].

^
This type of research mostly advances AI capabilities, so I want to see less of it.
^
Notice that this is a positive query (be not disturbed, I ran the negative one too, which caused a substantial rewrite).
^
“Cognitive macros.”
^
ML folks call this type of prior an inductive bias.
^
I’m around 80% confident that this would be much stronger than any extant rationality training, and ~25% conditional on efficacy that we could get to a pivotal act with only biofeedback rationality (e.g. no extra items in working memory, same measured IQ).