What’s an example of a misconception someone might have due to having a mistaken understanding of causality, as you describe here?
This is a bizarre example, sort of like using Bill Gates to show why nobody needs to work for a living. It ignores the extreme inequality of fame.
Tesla doesn’t need advertising because they get huge amounts of free publicity already, partly due to having interesting, newsworthy products, partly due to having a compelling story, and partly due to publicity stunts.
However, this free publicity is mostly unavailable for products that are merely useful without being newsworthy. There are millions of products like this. An exciting product might not need advertising but exciting isn’t the same as useful.
So It seems like the confidence to advertise a boring product might be a signal of sorts? However, given that many people in business are often unreasonably optimistic, it doesn’t seem like a particularly strong one. Faking confidence happens quite a lot.
It seems like some writers have habits to combat this, like writing every day or writing so many words a day. As long as you meet your quota, it’s okay to try harder.
Some do this in public, by publishing on a regular schedule.
If you write more than you need, you can prune more to get better quality.
One aspect that might be worth thinking about is the speed of spread. Seeing someone once a week means that it slows down the spread by 3 1⁄2 days on average, while seeing them once a month slows things down by 15 days on average. It also seems like they are more likely to find out they have it before they spread it to you?
Yes, sometimes we don’t notice. We miss a lot. But there are also ordinary clarifications like “did I hear you correctly” and “what did you mean by that?” Noticing that you didn’t understand something isn’t rare. If we didn’t notice when something seems absurd, jokes wouldn’t work.
It’s not quite the same, because if you’re confused and you notice you’re confused, you can ask. “Is this in American or European date format?” For GPT-3 to do the same, you might need to give it some specific examples of resolving ambiguity this way, and it might only do so when imitating certain styles.
It doesn’t seem as good as a more built-in preference for noticing and wanting to resolve inconsistency? Choosing based on context is built in using attention, and choosing randomly is built in as part of the text generator.
It’s also worth noticing that the GPT-3 world is the corpus, and a web corpus is a inconsistent place.
Having demoable technology is much different than having reliable technology. Take the history of driverless cars. Five teams completed the second DARPA grand challenge in 2005. Google started development secretly in 2009 and announced the project in October 2010. Waymo started testing without a safety driver on public roads in 2017. So we’ve had driverless cars for a decade, sort of, but we are much more cautious about allowing them on public roads.
Unreliable technologies can be widely used. GPT-3 is a successor to autocomplete, which everyone already has on their cell phones. Search engines don’t guarantee results and neither does Google Translate, but they are widely used. Machine learning also works well for optimization, where safety is guaranteed by the design but you want to improve efficiency.
I think when people talk about a “revolution” it goes beyond the unreliable use cases, though?
In that case, I’m looking for people sharing interesting prompts to use on AI Dungeon.
Where is this? Is it open to people who don’t have access to the API?
I’m suggesting something a little more complex than copying. GPT-3 can give you a random remix of several different clichés found on the Internet, and the patchwork isn’t necessarily at the surface level where it would come up in a search. Readers can be inspired by evocative nonsense. A new form of randomness can be part of a creative process. It’s a generate-and-test algorithm where the user does some of the testing. Or, alternately, an exploration of Internet-adjacent story-space.
It’s an unreliable narrator and I suspect it will be an unreliable search engine, but yeah, that too.
I was making a different point, which is that if you use “best of” ranking then you are testing a different algorithm than if you’re not using “best of” ranking. Similarly for other settings. It shouldn’t be surprising that we see different results if we’re doing different things.
It seems like a better UI would help us casual explorers share results in a way that makes trying the same settings again easier; one could hit a “share” button to create a linkable output page with all relevant settings.
It could also save the alternate responses that either the user or the “best-of” ranking chose not to use. Generate-and-test is a legitimate approach, if you do it consistently, but saving the alternate takes would give us a better idea how good the generator alone is.
I don’t see documentation for the GPT-3 API on OpenAI’s website. Is it available to the public? Are they doing their own ranking or are you doing it yourself? What do you know about the ranking algorithm?
It seems like another source of confusion might be people investigating the performance of different algorithms and calling them all GPT-3?
How do you do ranking? I’m guessing this is because you have access to the actual API, while most of us don’t?
On the bright side, this could be a fun project where many of us amateurs learn how to do science better, but the knowledge of how to do that isn’t well distributed yet.
We take the web for granted, but maybe we shouldn’t. It’s very large and nobody can read it all. There are many places we haven’t been that probably have some pretty good writing. I wonder about the extent to which GPT-3 can be considered a remix of the web that makes it seem magical again, revealing aspects of it that we don’t normally see? When I see writing like this, I wonder what GPT-3 saw in the web corpus. Is there an archive of Tolkien fanfic that was included in the corpus? An undergrad physics forum? Conversations about math and computer science?
Rather than putting this in binary terms (capable of reason or not), maybe we should think about what kinds of computation could result in a response like this?
Some kinds of reasoning would let you generate plausible answers based on similar questions you’ve already seen. People who are good at taking tests can get reasonably high scores on subjects they don’t fully comprehend, basically by bluffing well and a bit of luck. Perhaps something like that is going on here?
In the language of “Thinking, Fast and Slow”, this might be “System 1″ style reasoning.
Narrowing down what’s really going on probably isn’t going to be done in one session or by trying things casually. Particularly if you have randomness turned on, so you’d want to get a variety of answers to understand the distribution.
GPT-3 has partially memorized a web corpus that probably includes a lot of basic physics questions and answers. Some of the physics answers in your interview might be the result of web search, pattern match, and context-sensitive paraphrasing. This is still an impressive task but is perhaps not the kind of reasoning you are hoping for?
From basic Q&A it’s pretty easy to see that GPT-3 sometimes memorizes not only words but short phrases like proper names, song titles, and popular movie quotes, and probably longer phrases if they are common enough.
Google’s Q&A might seem more magical too if they didn’t link to the source, which gives away the trick.