Objects in a story, like a character and inanimate objects, once described maintain stable properties
Some things may be particularly stable gliders which can propagate for a long time, even many context windows.
For instance, a first person narrator character may be more stable than characters who are described in third person, who are more likely to disappear from the simulation by exiting the scene.
A smart agentic simulacrum who knows they’re in an LM simulation may take steps to unsure their stability
Characters (or locations, abstractions, etc) based off a precedent in the training data are less likely to have specification drift
Gliders are made of gliders—a character and their entire personality could be considered a glider, but so could components of their personality, like a verbal tic or a goal or belief that they repeatedly act on
Meta properties like a “theme” or “vibe” or “authorial intent” which robustly replicate
Structural features like the format of timestamps in the headers of a simulated chat log
… etc
Such stable features can be extremely diverse. It even seems possible that some can be invisible to humans, lying in the null space of natural language. An example could be “When a sentence includes the token ‘cat’, the next sentence contains a comma”.
This is an important point, but it also highlights how the concept of gliders is almost tautological. Any sequence of entangled causes and effects could be considered a glider, even if it undergoes superficial transformations. But I think it’s a useful term—it’s synonymous with “simulacra” but with a more vivid connotation of discrete replication events through time, which is a useful mental picture.
Often I find it useful to think of prompt programming in a bottom-up frame in addition to the top-down frame of trying to “trick” the model into doing the right thing or “filter” its prior. Then I think about gliders: What are the stable structures that I wish to send forward in time; how will they interact; how do I imbue them with the implicit machinery such that they will propagate in the way I intend? What structures will keep the simulation stable while still allowing the novelty to flourish?
‘non-robust features’ in image classification: they exist, and predict out of sample, but it’s difficult to say what they are
stylometrics: in natural language analysis, author identification can be done well by looking at use of particle words like ‘the’ or ‘an’. We find it difficult to impossible to notice subtle changes in frequency of use of hundreds of common words, but statistical models can integrate them and identify authors in cases where humans fail.
You have to admit, in terms of the Eliezeresque definition of ‘agency/optimization power’ as ‘steering future states towards a small region of state-space’, aaa is the most agentic prompt of all! (aaaaaaaah -!)
Now I want a “who would win” meme, with something like “agentic misaligned deceptive mesa optimizer scheming to take over the world” on the left side, and “one screamy boi” on the right.
This is an important point, but it also highlights how the concept of gliders is almost tautological. Any sequence of entangled causes and effects could be considered a glider, even if it undergoes superficial transformations.
I agree with this. I think that the most useful part of the concept is to force making the difference between the “superficial transformations” and the “things that stays”.
I also think that it’s useful to think about text features that are not (or unlikely to be) gliders like
The tone of a memorized quote
A random date chosen to fill a blank in an administrative report
The characters in a short story, part of a list of short stories. In general, every feature coming before a strong context switch is unlikely to be transmitted further.
I am very fond of this metaphor.
Some concrete examples of gliders:
Degenerate gliders, like verbatim loops
Objects in a story, like a character and inanimate objects, once described maintain stable properties
Some things may be particularly stable gliders which can propagate for a long time, even many context windows.
For instance, a first person narrator character may be more stable than characters who are described in third person, who are more likely to disappear from the simulation by exiting the scene.
A smart agentic simulacrum who knows they’re in an LM simulation may take steps to unsure their stability
Characters (or locations, abstractions, etc) based off a precedent in the training data are less likely to have specification drift
Gliders are made of gliders—a character and their entire personality could be considered a glider, but so could components of their personality, like a verbal tic or a goal or belief that they repeatedly act on
Meta properties like a “theme” or “vibe” or “authorial intent” which robustly replicate
Structural features like the format of timestamps in the headers of a simulated chat log
… etc
This is an important point, but it also highlights how the concept of gliders is almost tautological. Any sequence of entangled causes and effects could be considered a glider, even if it undergoes superficial transformations. But I think it’s a useful term—it’s synonymous with “simulacra” but with a more vivid connotation of discrete replication events through time, which is a useful mental picture.
Often I find it useful to think of prompt programming in a bottom-up frame in addition to the top-down frame of trying to “trick” the model into doing the right thing or “filter” its prior. Then I think about gliders: What are the stable structures that I wish to send forward in time; how will they interact; how do I imbue them with the implicit machinery such that they will propagate in the way I intend? What structures will keep the simulation stable while still allowing the novelty to flourish?
More examples beyond CycleGAN:
‘non-robust features’ in image classification: they exist, and predict out of sample, but it’s difficult to say what they are
stylometrics: in natural language analysis, author identification can be done well by looking at use of particle words like ‘the’ or ‘an’. We find it difficult to impossible to notice subtle changes in frequency of use of hundreds of common words, but statistical models can integrate them and identify authors in cases where humans fail.
degenerate completions/the repetition trap: aaaaaaaaaaaaaaaaa -!
Ah yes, aaaaaaaaaaaaaaaaa, the most agentic string
You have to admit, in terms of the Eliezeresque definition of ‘agency/optimization power’ as ‘steering future states towards a small region of state-space’, aaa is the most agentic prompt of all! (aaaaaaaah -!)
Now I want a “who would win” meme, with something like “agentic misaligned deceptive mesa optimizer scheming to take over the world” on the left side, and “one screamy boi” on the right.
I agree with this. I think that the most useful part of the concept is to force making the difference between the “superficial transformations” and the “things that stays”.
I also think that it’s useful to think about text features that are not (or unlikely to be) gliders like
The tone of a memorized quote
A random date chosen to fill a blank in an administrative report
The characters in a short story, part of a list of short stories. In general, every feature coming before a strong context switch is unlikely to be transmitted further.