This seems to have been an excellent exercise in noticing confusion; in particular, to figure this one out properly would have required one to not recognize that this behavior does not accord with one’s pre-existing model, rather than simply coming up with an ad hoc explanation to fit the observation.
I therefore award partial marks to Rafael Harth for not proposing any explanations in particular, as well as Viliam in the comments:
I assumed that the GPT’s were just generating the next word based on the previous words, one word at a time. Now I am confused.
I am fairly confident that Latitude wrap your Dungeon input before submitting it to GPT-3; if you put in the prompt all at once, that’ll make for different model input than putting it in one line at a time.
Don’t make up explanations! Take a Bayes penalty for your transgressions!
(No one gets full marks, unfortunately, since I didn’t see anyone actually come up with the correct explanation.)
Someone else said in a comment on LW that they think “custom” uses GPT-2, whereas using another setting and then editing the opening post will use GPT-3. I wanted to give them credit in response to your comment, but I can’t find where they said it. (They still wouldn’t get full points since they didn’t realize custom would use GPT-3 after the first prompt.) I initially totally rejected the comment since it implies that all of the custom responses use GPT-2, which seemed quite hard to believe given how good some of them are.
Some of the twitter responses sound quite annoyed with this, which is a sentiment I share. I thought that getting the AI to generate good responses was important at every step, but (if this is true and I understand it correctly), it doesn’t matter at all after the first reply. That’s some non-negligible amount of wasted effort.
Here’s the actual explanation for this: https://twitter.com/nickwalton00/status/1289946861478936577
This seems to have been an excellent exercise in noticing confusion; in particular, to figure this one out properly would have required one to not recognize that this behavior does not accord with one’s pre-existing model, rather than simply coming up with an ad hoc explanation to fit the observation.
I therefore award partial marks to Rafael Harth for not proposing any explanations in particular, as well as Viliam in the comments:
Zero marks to Andy Jones, unfortunately:
Don’t make up explanations! Take a Bayes penalty for your transgressions!
(No one gets full marks, unfortunately, since I didn’t see anyone actually come up with the correct explanation.)
Someone else said in a comment on LW that they think “custom” uses GPT-2, whereas using another setting and then editing the opening post will use GPT-3. I wanted to give them credit in response to your comment, but I can’t find where they said it. (They still wouldn’t get full points since they didn’t realize custom would use GPT-3 after the first prompt.) I initially totally rejected the comment since it implies that all of the custom responses use GPT-2, which seemed quite hard to believe given how good some of them are.
Some of the twitter responses sound quite annoyed with this, which is a sentiment I share. I thought that getting the AI to generate good responses was important at every step, but (if this is true and I understand it correctly), it doesn’t matter at all after the first reply. That’s some non-negligible amount of wasted effort.