oceaninthemiddleofanisland

Karma: 151

oceaninthemiddleofanisland 29 Jul 2020 5:13 UTC
50 points
on: Predictions for GPT-N
Two thoughts, one of them significantly longer than the other since it’s what I’m most excited about.
(1) It might be the case that the tasks showing an asymptotic trend will resemble the trend for arithmetic – a qualitative breakthrough was needed, which was out of reach at the current model size but became possible at a certain threshold.
(2) For translation, I can definitely say that scaling is doing something. When you narrowly define translation as BLEU score (“does this one generated sentence match the reference sentence? by how much?”), then I agree that the benefits of scaling are marginal – for individual sentences, by that specific metric.
But here’s the thing, GPT-3 can produce idiomatically and culturally accurate translations of Chinese poetry, and then annotate its own translation with references to historical events, the literal versus contextual meaning of words, and so on. The end result actually sounds … like poetry. But it can do other things. If you give it a Japanese text, and then tell it to translate for an American audience, it will either seamlessly explain those references in the translation, or substitute Japanese cultural references for their American equivalent entirely.
But it’s deeper than this. Some non-English languages have honorifics attached to verbs. Some languages have distinctions between the plural and singular form of ‘you’. Some languages have nouns that are inflected depending on whether the noun is in motion or not. Some languages have particles added to the ends of sentences that indicate whether the speaker is hesitant about the statement.
GPT-3 fills in the blanks by making real-world inferences.
If you told me a few years ago about a translation engine that could handle things like ambiguous pronouns, or keep track of speakers across several paragraphs, I would be amazed. If you’d told me about a translation engine that could accurately invent the appropriate missing information, or transfer nuances of the source into the target in a way that sounded natural, I flat-out wouldn’t believe you.
Okay, so what else? Some languages have multiple registers that depend on social context or strongly regional dialects. Current translation engines use a parallel corpus – for instance, news outlets that translate the same article into multiple languages, or EU documents which get translated into all major EU languages – get featured very heavily in these kinds of corpora, so you end up getting a standardised, non-dialectal translation in a formal register.
GPT-3 is not limited by this. It can translate between dialects. It can translate between registers. It can pick up on things like “this story is set in Bandung” and “this character is a construction worker talking to a close friend, not a student talking to a teacher”, and then have the character start code-mixing Indonesian with Sundanese in the low form. I haven’t explored this deeply, but initial prompts are suggesting it’s capable of rendering Indonesian tweets and phone texts (with their various abbreviations) into their equivalents in English.
Here’s the kicker: Indonesian makes up only 0.05985% of GPT-3′s training corpus.
And for that same reason, GPT-3 can handle tone. It can understand the connotative difference between someone describing themselves as “slim”, “thin”, and “scrawny”, and then find a way to appropriately convey it in the target language – and if the target language doesn’t have those separate shades of difference, and you tell it that conveying the difference unambiguously is very important to you, it will figure out ways to do it unprompted, like modifying the tone of surrounding words, or adding a simile where the character compares themselves to a skeleton, up to adding an entire extra scene that doesn’t interrupt the main narrative just to make it clear.
(I have not seen it do this consistently, but on two occasions I have seen it invent new words in Indonesian, which uses affixes to modify root-forms—e.g ‘memasak’ = ‘to cook’, ‘masakan’ = ‘a cook’, etc., that a Google search verified weren’t in the training corpus. Unfortunately in some situations, it will instead use a word with a different meaning but in the same category [e.g instead of ‘striped turquoise midi-dress’, you might get ‘spotty blue wrap-dress’], when it judges the difference to be unimportant to the story. Good in some contexts but annoying in others.)
So this is all great. For anyone that consumes text media, I mean – not for translators (I doubt we’ll be put out of a job but the skill requirement will drop considerably, I think) – it means a huge ocean of previously unreadable knowledge and entertainment is suddenly going to be accessible.
But I’m a language learner as well, and my guess is that this community might have more of us than the baseline average, so here are some other obvious but useful things it can do:
1. It can create arbitrary amounts of synthetic learning material.
This is a big deal for a few reasons.
(A) Sometimes, for less commonly-learned languages like Indonesian, there isn’t much learning material available in the first place. The only Anki deck available is filled with sentences like “I can do it” and “John was angry at me”. This is an issue if you want mass immersion. Quantity is an issue.
(B) Sometimes, there isn’t material on stuff you’re interested in, things that are relevant to you. Quality is an issue. The key thing that predicts learner performance is interest. If all the textbooks you’re reading are oriented towards tourists and they’re talking about hotels and making small-talk about the weather, and you want to read, I don’t know, cute light-hearted yuri manga, or military strategy in the South China Sea, then you’re screwed… unless you have GPT-3. If there’s a particular grammatical feature you’re having trouble internalising, then you provide GPT-3 with a few examples and it’ll happily provide you with a hundred more. If there’s a word that isn’t sticking in your memory...
(C) A combination of A and B: the best way to learn a language is by actively using it. Constantly. Not just passively reading it, by producing it yourself. What hyperpolyglots recommend usually is going and living in the country that speaks your target language, or regularly having conversations with people who do. That’s an issue if (1) you have problems with social anxiety (2) there aren’t people nearby or (3) you aren’t willing to uproot your entire life and spend tens of thousands of dollars just to learn a new language.
This is where AIDungeon’s fine-tuned GPT-3 instance comes in. You select a scenario that involves the set of vocabulary you want to practice (if you’re planning a trip to Hungary, you simulate a trip and the hotel-stay, if you’re moving to a school, you simulate being a student at the International School of Budapest), or a story you could see yourself being invested in (horniness not precluded).
Then you customise it according to your level, the goal being comprehensible input that’s just at the edge of your comfort zone. If you’re an advanced learner with a lot of vocabulary under your belt, you use a handful of target-lang words to tell the model it’s meant to be speaking Hungarian, not English, and you jump into the deep-end and enjoy participating in the story (writing your dialogue and prose, etc.) while adding any unknown vocab items to Anki. If you’re a beginner, you should probably make a scenario involving a personal tutor who tests you after each lesson while introducing new words slowly and explaining concepts fully (see part II).
If you’re an intermediate learner, things are tougher. There might be an easier way to do this than what I’m about to describe, which is part of the point of this post, since I want to get new ideas from the community. What I’ve found works for me is priming the model to produce English translations after target-lang sentences. When the target-lang sentence comes up, you try and guess the English translation before hitting the generate button. Cool, reading comprehension – done. If you want control over the narrative or you want to do this in inverse, you prepend each paragraph with either ‘English [line number]’ or ‘target-lang [line number]’, and then shuffle the order of those paired paragraphs randomly so it translates to target-lang when it sees English, and English when it sees target-lang. What about speaking/writing? Again, what works for me is talking in target-language pidgin, where you just use English for any words you don’t know, and then priming the model to produce grammatically-correct translations after your shitty dialogue. Contrary to intuition, mixing like this is not at all harmful for language learning.
(D) Sentence pairs with translations are the mainstay of self-directed language-learning, because they’re easy to find (mostly). But using language isn’t all about translation. What part of speech is this? What if you wanted to vary things? Here is a conversation, what would be appropriate to say next? Clozes / filling in the gaps—what word would be most appropriate in this sentence? Is the meaning of this sentence closer to options A, B, C, or D? What register would you use in this social situation? Speak your response quickly then write it. Quick, name 10 words related to this word. See this paragraph? Summarise it. What about this argument, is it logically flawed? It’s interesting how a lot of NLP datasets I’ve come across actually make for very good flashcards for language learning, which, I suppose isn’t all that surprising.
II. It can explain things.
What’s the difference between ‘bantu’ and ‘tolong’ – they both mean ‘help’, but how do you use them in sentences? I don’t understand why the words are being ordered like this, explain the grammar to me. Why does this flashcard translate ‘you’ as ‘loe’, while another one uses ‘kalian’, or ‘kamu’, or ‘kau’ or ‘Anda’? (For this, you need to prime it with a nonsense / meaningless question and have it say ‘I don’t know’, otherwise it’ll make up answers for things it doesn’t know, or words that literally mean the same thing with no difference in usage whatsoever.)
But the great thing is, it can draw on real-world knowledge. You’re never learning just a language. You’re also learning the cultural context in which that language is used. If you try to do the former without doing the latter, some linguistic idiosyncrasies are going to remain mysterious to you until someone explains that the weird ungrammatical phrase you’re having trouble understanding, actually came from a 1998 hit soap-opera and now it’s just a part of the language. Or that this term is a historical one that refers to Sukarno’s policy of civil-military integration. Or that the reason why none of the dialogue involves first-name usage is because it’s super impolite to do that with someone you don’t know well.
Sometimes you can scan the indices of appropriate textbooks, or do a google search. But sometimes there aren’t textbooks, sometimes you don’t even know what to search for, sometimes you’re asking a question that’s never been asked before. And I think that’s the real power of GPT-3 as it exists right now – all of the human knowledge that’s currently unindexed, informal, uninterpretable, implied, ambiguous, unclear and inaccessible – it makes available with a single query.
Or an hour finicking around and handwriting 20 examples until it cottons on.
But … that happens more often with other contexts. Getting it to count parentheses accurately is like pulling teeth, but with translation tasks GPT-3 seems to go “aha! now this is something I’m good at!” and then explains the tonal differences between “神様” and “女神” in a Japanese poem about a lesbian sea-goddess it wrote five minutes ago. OpenAI’s paper was doing GPT-3 a really, really big disservice by quantifying it by its BLEU score. When it comes to language, GPT-3 isn’t a model, it’s a maestro.

oceaninthemiddleofanisland 27 Jul 2020 8:02 UTC
30 points
in reply to: orthonormal’s comment on: Developmental Stages of GPTs
Yes! I was thinking about this yesterday, it occurred to me that GPT-3′s difficulty with rhyming consistently might not just be a byte-pair problem, any highly structured text with extremely specific, restrictive forward and backward dependencies is going to be a challenge if you’re just linearly appending one token at a time onto a sequence without the ability to revise it (maybe we should try a 175-billion parameter BERT?). That explains and predicts a broad spectrum of issues and potential solutions (here I’m calling them A, B and C): performance should correlate to (1) the allowable margin of error per token-group (coding syntax is harsh, solving math equations is harsh, trying to come up with a rhyme for ‘orange’ after you’ve written it is harsh), and (2) the extent to which each token-group depends on future token-groups. Human poets and writers always go through several iterations, but we’re asking it to do what we do in just one pass.
So in playing around with GPT-3 (AID), I’ve found two (three?) meta approaches for dealing with this issue. I’ll call them Strategies A, B and C.
A is the more general one. You just give it multiple drafting opportunities and/or break up the problem into multiple smaller steps. So far I’ve seen it work for:
(1) Boolean logic, algebraic equations, simple math equations works (guess-and-check). When I have time in a few days, I’m going to get it to mimic the human heuristic for calculating approximate square-roots over multiple iterations.
(2) Translating Chinese poems to English roughly and then touching them up in the second draft. Same with editing any kind of text.
(3) Tricky coding problems (specifically, transforming a string into Pig Latin). First, instead of asking it to “solve the problem”, you ask it to “come up with five possible strategies for solving the problem”, and then “select the most plausible one”. Then you say “you made several structural, syntactical, and interpretive mistakes”, allow it to come up with a long list of those possible mistakes, say, “now try again”, and do that as many times as the context window allows. The end result isn’t always functional, but it’s a lot better than asking it to solve something in one pass.
B is the moderately less general, and more obvious second approach, which synergises well with the first approach. B is forcing GPT-3 to plan explicitly.
(1) In writing an article, you get GPT-3 to start by writing a vague summary, then a more in-depth summary, then listing the key points and subpoints in order. By periodically forcing it to summarise its discussion up to a given point, you can exceed the window length while retaining coherency.
(2) In writing poetry from a prompt, you get GPT-3 to discuss and tease out the implications of the prompt and describe the process of planning the poetry first.
(3) In translating, you get it to list out the key potential translation errors that could be made, and the different choices a translator could make in translating each line.
(4) In writing code, you get GPT-3 to simulate several people discussing the problem requirements and arguing constructively with one another (simulating just one person means if that one person goes off track or misinterprets the problem, future continuations are poisoned with the error since they need to be consistent), then producing English pseudo-code that describes the process in abstract, and only then the actual code.
I decided to add ‘simulating multiple people’ as a Strategy C, but it’s kind of the same thing as Strategy A but in a way that allows more room for error. The issue is that in most single-author texts, people try to be consistent with what they’ve said before, but in GPT-3, this can cause minor errors (for instance, self-contradiction) to accumulate over time, which reduces generation quality. But we’ve seen that something as simple as adding dialogue between two people, allows GPT-3 to arrive at accurate and more complex solutions much more reliably. This works for a broad spectrum of media: articles, poetry, translation, and coding. All you need to do is create a ‘critic’ who interrupts after each line or paragraph, and then if you really need one, a critic who criticises the first critic. The key here is constructive rather than destructive criticism, since GPT-3 is perfectly capable of producing vacuous and petty critiques.
All three of these strategies together tend to vastly improve performance on tasks where (1) the allowable margin of error per token-group is quite small (for instance, solving 83x42), and (2) current token-groups depends on future token-groups. I have not tested this for rhyming, but it seems simple enough to check.
In other words, GPT-3 does better at solving problems when you get it to simulate the way humans solve problems: with multiple attempts, with explicit planning, and by collaborating with other humans.
Edit: my attempts at making GPT-3 rhyme failed. Here is what I tried, and what I figured out.
(1) It has a vague idea of rhyming—if you fill its context-window with groups of words that rhyme, about 40-60% of the words in its next generation will rhyme, and the rest will look like rhymes (as in, they end with the same couple of letters but are pronounced differently in English—e.g dough, cough, rough, etc.).
(1a) Most rhyming websites are query-based. From what I could tell, GPT-3 has not memorised the layout of the most common rhyming websites to the degree where it could reproduce the formatting consistently. This is not surprising given that Common Crawl abides by nofollow and robots.txt policies, and that OpenAI may have filtered these pages out when they were paring the dataset down to ‘high-quality’ documents.
(1b) GPT-3 knows how most Chinese words are pronounced, even if it gets the tone wrong sometimes. It rhymes more consistently in languages with uncommon diacritic markings, more with languages that don’t use Latin characters, and even more consistently in non-Latin-based languages with phonemic orthography, but not by much. With Russian, you hit the jackpot—BPE represents it as individual characters, it’s mostly phonemic, there’s a lot of Russian in GPT-3′s dataset, and a lot of rhyming poetry—but it still does poorly. This either suggests that an absence of looking forward + randomness introduced by sampling is the main issue here. Unfortunately the other most-well-represented languages in its dataset with non-Latin phonemic orthography (Japanese kana, Korean hangul, Arabic script) each have their own issues—rhyming the last syllable of each line in Korean is easy since it’s an SOV language and all you have to do is match the verb conjugation, so it doesn’t have much literary value. Most of the rhyming in the dataset would likely be modern rap, which sometimes uses multiple syllables. Arabic omits short vowels. Japanese I know less about, but iirc rhyming is much less common than other forms of constrained writing (e.g haiku) that emphasise rhythm, and mostly occurs in j-pop.
(2) Giving it multiple attempts failed. ‘Multiple generations for each line + selecting the ones that rhyme’ works, but we already know that.
(3) Supplying rhymes kind of worked. It would do well for a handful lines and then go off track. Giving it multiple possible choices was very bad. It would include the words randomly within lines, or near the end of lines, and sometimes at the very end. This might be rectified by more examples, since AID is limited to 1000 tokens/characters. But I do suspect the issue is a more fundamental one.
(4) Splitting words into syllables failed, but I didn’t try this one exhaustively. The only benefit of word-splitting occurs when the beginning of the word matters (e.g alliteration), because it allows for ‘denser’ computation per token (on the character/syllable level, not the word level). Plus, we’re talking about the English language. Even actual English speakers regularly have trouble with knowing how words are pronounced, orthography kind of hinders rather than helps in this case.
(5) ‘Reminding’ it of the end word between each line failed.
(6) Forcing it to generate in IPA first did not work. However, it does have a vague idea of how to transliterate English into IPA and a better idea of how to transliterate IPA into English.
(7) Future attempts: my prompting was very abstract, and we know that GPT-3 works better when there’s a familiar context surrounding the task / the prompt is within the training distribution. I will try the context of an English writing assignment.
What links here?
- Why GPT wants to mesa-optimize & how we might change this by John_Maxwell (19 Sep 2020 13:48 UTC; 55 points)

oceaninthemiddleofanisland 19 Jul 2020 0:07 UTC
10 points
on: Collection of GPT-3 results
A bunch of more examples here, a bit difficult to summarise since it went from explaining how dopamine receptors work, to writing a poem about Amazon’s logistics in the form of a paean to the Moon Goddess, writing poems in Chinese based on English instructions and then providing astonishingly-good translations, to having Amazon and Alibaba diss one another in the style of 18th century poet Mary Robinson. Link here: https://www.reddit.com/r/slatestarcodex/comments/hrx2id/a_collection_of_amazing_things_gpt3_has_done/fy7i7im/?context=3
Example:
The oracle says: “Oh Moon Goddess, hear my prayer
As we look upon this beauteous evening,
From these windowless towers our workers do depart
To fulfill quotas in a quest for morning.
Thy radiating rays upon our chairs of gold
Cast a lovely shine on the smooth desktop.
Yet, what’s this ahead? A crease in the land?
Deformed packages that must be delivered?
Disasters never come singly, you’ll find!
Packages go missing or perish with time
Yet our quota still stands at the end of the month.
Now we’re falling behind, what shall we do?
Let us hire new workers and fire the slackers!
Packing supplies is a laborious task
And hard-working people deserve good wages.
Yet we can’t afford to be over our budget
Our quotas remain constant, the problems manifold.
The demand for our packages is endless

oceaninthemiddleofanisland 1 Aug 2020 7:20 UTC
7 points
in reply to: avturchin’s comment on: Predictions for GPT-N
So I’ve figured this out. Kinda. If you choose ‘custom’ then it will give you Griffin, but if you choose one of the conventional prompts and then edit it, you can get around it. So damn annoying.

oceaninthemiddleofanisland 29 Jul 2020 21:38 UTC
7 points
in reply to: gwern’s comment on: Predictions for GPT-N
That’s a visualisation I made which I haven’t posted anywhere else except under the r/ML thread collecting entries for GPT-3 demos, since I couldn’t figure out which subreddit to post it in.

oceaninthemiddleofanisland 23 Jul 2020 0:38 UTC
7 points
on: Alignment As A Bottleneck To Usefulness Of GPT-3
The best angle of attack here I think, is synthesising knowledge from multiple domains. I was able to get GPT-3 to write and then translate a Japanese poem about a (fictional) ancient language model into Chinese, Hungarian, and Swahili and annotate all of its translations with stylistic notes and historical references. I don’t think any humans have the knowledge required to do that, but unsurprisingly GPT-3 does, and performed better when I used the premise of multiple humans collaborating. It’s said that getting different university departments to collaborate tends to be very productive wrt new papers being published. The only bottleneck is whether its dataset includes scientific publications and the extent to which it can draw upon memorised knowledge (parameter count).

oceaninthemiddleofanisland 30 Jul 2020 3:44 UTC
6 points
in reply to: avturchin’s comment on: Predictions for GPT-N
Somehow the more obvious explanation didn’t occur to me until now, but check the settings, you might be using the Griffin model not the Dragon model. You have to change it manually even after you get the subscription. I have a window open specifically for poetry prompts (using the Oracle hack), I said “Write a long poem in Russian. Make sure the lines are long, vivid, rich, and full of description and life. It should be a love poem addressed to coffee. It should be 15 lines long” followed with “The Oracle, which is a native in Russian, writes: 1 Ой,”. That just gave me annoying stuff like “Oh, coffee, how I love you so much/You are sweet” in Russian so I then added a random Pushkin poem in the ‘death’ category (https://rustih.ru/aleksandr-pushkin-tam-u-leska-za-blizhneyu-dolinoj/) before it to increase the quality and right now it just gave me:
1 Ой да промелькнул январский день,
2 Когда я словно трепетный цветок
3 Нежно расставил руки, 4 Когда ты, чаша чистейшего черного кофе,
5 С сияньем твоим, я так созерцал.
6 Молодость! Как радостны те часы!
7 И я, целованьем твоим так молодо
8 Неслучайно тыкал я туда,
9 Где только чувства, только сердце,
10 Ты, скромный чайный граней, теплый,
11 Несли ты, с радостным свеченьем ступеней,
12 В смолкновенья сладких сумерек
and so on and so on, stopped it there
I have no idea whether this is good since all I did was check with GTranslate to make sure it was roughly on topic, I would ask GPT-3 to translate it for me but I’m guessing you’re a native speaker so you should be able to tell.

oceaninthemiddleofanisland 30 Jul 2020 23:02 UTC
4 points
in reply to: avturchin’s comment on: Predictions for GPT-N
I’m beginning to think AID has changed what the “Dragon” model is without telling us for cost reasons, I’ve had kind of the same experience with big lapses in storytelling that didn’t occur as often before. Or maybe it’s randomly switching based on server load? I can kind of understand it if that’s the case but the lack of transparency is annoying. I remember accidentally using the Griffin model for a day when my subscription ran out and not realising because its Indonesian was still quite good...

oceaninthemiddleofanisland 29 Jul 2020 21:44 UTC
4 points
in reply to: avturchin’s comment on: Predictions for GPT-N
If it’s a BPE encoding thing (which seems unlikely to me given that it was able to produce Japanese and Chinese characters just fine), then the implication is OpenAI carried over their encoding from GPT-2 where all foreign language documents were removed from the dataset … I would have trouble believing their team would have overlooked something that huge. This is doubly bizarre given that Russian is the 5/6th most common language in the dataset. You may want to try prompting it with coherent Russian text, my best guess is that in the dataset, whenever somebody says “He said in Russian:”, what usually follows is poor quality (for instance I see this in bad fanfiction where authors use machine translation services to add ‘authenticity’), and that GPT-3 is interpreting this as a signal that it should produce bad Russian. I will give this a try and see if I encounter the same issue.

oceaninthemiddleofanisland 1 Aug 2020 2:18 UTC
3 points
in reply to: gwern’s comment on: Predictions for GPT-N
Wow, I didn’t realise I could get this angry about something so esoteric.

oceaninthemiddleofanisland 13 Aug 2020 2:25 UTC
2 points
on: Writing Piano Songs: A Journey
This probably won’t add too much to the discussion but I’m curious to see whether other people relate to this or have a similar process. I was kind of stunned when I heard from friends who got into composing about how difficult it is to figure out a melody and then write a complete piano piece because to me, whenever I open up Sibelius or Dorico (and more recently Ableton), internally it seems like I’m just listening to what I wrote so far, ‘hearing’ a possible continuation lasting a few bars, and then quickly trying to transcribe it before I forget it, or if I really want to be precise then just the next note-group. It doesn’t really come from anywhere and it doesn’t require any thought, but I can tell it’s obviously taking up a share of my cognitive RAM from multitasking experiments, it’s definitely influenced by the music I’ve listen to recently (e.g 1930s/40s jazz), and there are a lot of recognisable patterns. I gave up piano at Grade 1 and my theory went to Grade 2 (I think) where I stopped because I intensely despised it. I actively avoided formal instruction. It makes transcribing harder because I’m just clicking on notes to see if they match up with what’s in my head and that interferes a lot with my memory, playing on an actual piano is even worse. So now what I do is use a phone app to record myself whistling 10-12 seconds of the ‘top’ melody, and then I play it back while making a new recording and I whistle the notes underneath it to, and I keep doing that until all the chords are right and the signal isn’t too degraded. It’s still very annoying. Something I should note is that I whistle whenever I’m alone pretty much obsessively and that’s been the case since I was maybe eight or nine, especially to accompany whatever music is playing around me, and that I have mild autism. It makes me think that with pretty much any creative skill, there are unconscious cognitive modules/black-boxes in play that have been developed either through a lot of exposure or through the internalisation/automatisation of heuristics and rules, which are responsible for predicting small sequences of actions (“what note comes next?”) or doing error-correction (“what sounds good?”). It’s difficult to notice/interact directly with them, but it’s possible when you override conscious controls. The easiest way to see this is to try asemic writing/typing – just typing or writing mindlessly and allowing your hands to just move by themselves. Once you get into the groove with asemic typing, you get Markov-chain-like strings of letters that reflect the character distribution of the language you type with, and sometimes common words like ‘the’ or ‘and’. With asemic writing, you get common patterns of loops, vertical and horizontal lines, and connectors. I’ve seen what seems to be higher-level language modules at work when I’m in a semi-lucid verge-of-fully-waking-up/falling-asleep state where my eyes are open but I’m also in dreamspace at the same time (I have no idea how to describe this), and I can read an imaginary book in front of me or listen to someone, and it’s just a fluent stream of meaningless babble often with a poetic quality to it, sometimes where consonants are carried over to the next word or semi-rhymes that would be a pain to come up with consciously.

oceaninthemiddleofanisland 1 Sep 2020 3:42 UTC
1 point
AF
in reply to: gwern’s comment on: interpreting GPT: the logit lens
How far away is this from being implementable?