Qiaochu_Yuan comments on Two Small Experiments on GPT-2

Qiaochu_Yuan 21 Feb 2019 4:33 UTC
17 points
0
Thanks for writing this up! I’m excited to see more people running experiments like this.
When you say “if I take X as a prompt, I get Y,” how many trials did you wait? In my own experimentation I’ve found lil’ GPT-2′s performance to be really variable across trials, and I’ve needed to wait 5 trials in some cases to get results I even sort of liked.
My sense overall of how lil’ GPT-2 functions after playing with it for awhile on several different kinds of prompts is that it has a strong sense of genre, and has done something like learned a bunch of different genre conventions for the different types of texts in the training set. If the prompt strongly resembles a genre familiar from training it will run with that genre, although sometimes it’ll wander off into another genre. It does quite poorly on prompts that I suspect don’t strongly match to a genre in the training set.
For example, I tried to run a Turing test (mostly as a joke) by prompting with “Question: what is your name? Answer:” and I got this on my ~2nd to 4th trial (don’t remember), with my speculations as to genre in [square brackets]:
Foulplay
I’m a Guardian Huntress, a deviation of the <Here> Monks keyword. I’m like a little “mechanic-creature hire”, but is Disassembler which is general clique-of-guy? I usually live in forms of Priests or Whispers of the Old Gods, but if it’s Asian Brotherhood of Magi..… [RP forum?]
deepmen
Reply ~29000 0 ~13 min 0 By TrueBeliever For MA teenage adventurer: Ooops It looks stupid but I’m given the wrong answer, and this logic is corrupt. Other first hand studies: sometimes your famed mechanical fiends attack other nvrock’s—I had such too—but you’re supposed to have flying paths to defend them would not it? This alien spawn is being unnecessary and a waste of time Man—you stole the Saalis… [gaming forum?]
Shadow Netherlands
Reply ~120200 0 ~30 min 0 By john : I know there’s no better examples of
a landing back off 2lands ( ou ever?) Low challenge difficulty
a lot of new cards in my deck,
which doesn’t draw cards: stacks high levels Some need life if w/o rest of deck already Defense Emperor’s | Inferno/Neck Scorer Necronomicon Mysticpetal Mana Enigma Mortalswords Mysticmist Scout Zarthai Sniper [MtG or Hearthstone forum?]
It’s a multiplier and it increases the yield on a lv301 fighter fighter, next to nothing it’s probably never in my deck or some random deck Dofrone’s | Boltforge Fastboat Sling dmt3 Efreet Flattestalker Infernal Eater Toxic Water Hurricane For another Holy orb suggested… [gaming forum? LoL?]
Was planning on posting a longer (mostly humorous) post with my own results but that post is low priority so I don’t know when it’s going to happen.
- jimrandomh 21 Feb 2019 5:05 UTC
  6 points
  0
  Parent
  This definitely could use more trials. In the case of the sentiment analysis experiment, I’d ideally like to try out some other sentence structures (eg “Is a <noun> bad?”, “Are <adjective> things good?); in the case of the Moloch experiment, I’d like to try some reruns with the same parameters, as well as different name substitutions, just to be sure that it isn’t noise.
- Gurkenglas 21 Feb 2019 9:30 UTC
  3 points
  0
  Parent
  Try varying lines 14 and 16 in the interactive script for quicker execution, and try giving it a few example lines to start with.