janus comments on Gliders in Language Models

janus 25 Nov 2022 2:42 UTC
12 points
4
I am very fond of this metaphor.
Some concrete examples of gliders:
- Degenerate gliders, like verbatim loops
- Objects in a story, like a character and inanimate objects, once described maintain stable properties
  - Some things may be particularly stable gliders which can propagate for a long time, even many context windows.
    For instance, a first person narrator character may be more stable than characters who are described in third person, who are more likely to disappear from the simulation by exiting the scene.
    A smart agentic simulacrum who knows they’re in an LM simulation may take steps to unsure their stability
    Characters (or locations, abstractions, etc) based off a precedent in the training data are less likely to have specification drift
  - Gliders are made of gliders—a character and their entire personality could be considered a glider, but so could components of their personality, like a verbal tic or a goal or belief that they repeatedly act on
- Meta properties like a “theme” or “vibe” or “authorial intent” which robustly replicate
- Structural features like the format of timestamps in the headers of a simulated chat log
- … etc
Such stable features can be extremely diverse. It even seems possible that some can be invisible to humans, lying in the null space of natural language. An example could be “When a sentence includes the token ‘cat’, the next sentence contains a comma”.
This is an important point, but it also highlights how the concept of gliders is almost tautological. Any sequence of entangled causes and effects could be considered a glider, even if it undergoes superficial transformations. But I think it’s a useful term—it’s synonymous with “simulacra” but with a more vivid connotation of discrete replication events through time, which is a useful mental picture.

Often I find it useful to think of prompt programming in a bottom-up frame in addition to the top-down frame of trying to “trick” the model into doing the right thing or “filter” its prior. Then I think about gliders: What are the stable structures that I wish to send forward in time; how will they interact; how do I imbue them with the implicit machinery such that they will propagate in the way I intend? What structures will keep the simulation stable while still allowing the novelty to flourish?
- gwern 25 Nov 2022 17:50 UTC
  8 points
  0
  Parent
  More examples beyond CycleGAN:
  - ‘non-robust features’ in image classification: they exist, and predict out of sample, but it’s difficult to say what they are
  - stylometrics: in natural language analysis, author identification can be done well by looking at use of particle words like ‘the’ or ‘an’. We find it difficult to impossible to notice subtle changes in frequency of use of hundreds of common words, but statistical models can integrate them and identify authors in cases where humans fail.
  - degenerate completions/the repetition trap: aaaaaaaaaaaaaaaaa -!
  What links here?
  - gwern's comment on By Default, GPTs Think In Plain Sight by Fabien Roger (30 Jan 2023 1:57 UTC; 113 points)
  - janus 25 Nov 2022 18:31 UTC
    6 points
    0
    Parent
    Ah yes, aaaaaaaaaaaaaaaaa, the most agentic string
    - gwern 25 Nov 2022 20:17 UTC
      12 points
      9
      Parent
      You have to admit, in terms of the Eliezeresque definition of ‘agency/optimization power’ as ‘steering future states towards a small region of state-space’, aaa is the most agentic prompt of all! (aaaaaaaah -!)
      - Quintin Pope 25 Nov 2022 22:46 UTC
        8 points
        8
        Parent
        Now I want a “who would win” meme, with something like “agentic misaligned deceptive mesa optimizer scheming to take over the world” on the left side, and “one screamy boi” on the right.
- Alexandre Variengien 25 Nov 2022 23:23 UTC
  3 points
  2
  Parent
  This is an important point, but it also highlights how the concept of gliders is almost tautological. Any sequence of entangled causes and effects could be considered a glider, even if it undergoes superficial transformations.
  I agree with this. I think that the most useful part of the concept is to force making the difference between the “superficial transformations” and the “things that stays”.
  I also think that it’s useful to think about text features that are not (or unlikely to be) gliders like
  - The tone of a memorized quote
  - A random date chosen to fill a blank in an administrative report
  - The characters in a short story, part of a list of short stories. In general, every feature coming before a strong context switch is unlikely to be transmitted further.