I have a neat idea for a smartphone app, but I would like to know if something similar exists before trying to create it.
It would be used to measure various things in one’s life without having to fiddle with spreadsheets. You could create documents of different types, each type measuring something different. Data would be added via simple interfaces that fill in most of the necessary information. Reminders based on time, location and other factors could be set up to prompt for data entry. The gathered data would then be displayed using various graphs and could be exported.
The cool thing is that it would be super simple to reliably measure most things on a phone in a way that’s much simpler than keeping a spreadsheet. For example: you want to measure how often you see a seagull. You’d create a frequency-measuring document, entitle it “Seagull sightings”, and each time you open it, there’d be a big button for you to press indicating that you just saw a seagull. Pressing the button would automatically record the time and date, perhaps the location, when this happened. Additional fields could be added, like the size of the seagull, which would be prompted and logged with each press. With a spreadsheet, you’d have to enter the date yourself, and the interface isn’t nearly as convenient.
Another example: you’re curious as to how long you sleep and how you feel in the morning. You’d set up an interval-measuring document with a 1-10 integer field for sleep quality and reminders tied into your alarm app or the time you usually wake up. Each morning you’d enter hours slept and rate how good you feel. After a while you could look at pretty graphs and mine for correlations.
A third example: you can emulate the experience sampling method for yourself. You would have your phone remind you to take the survey at specific times in the day, whereupon you’d be presented with sliders, checkboxes, text fields and other fields of your choosing.
This could be taken further in a useful way by adding a crowd sourcing aspect. Document-templates could be shared in a sort of template marketplace. The data of everyone using a certain template would accumulate in one place, making for a much larger sample size.
TL;DR: Thought this post was grossly misleading. Then I saw that the GPT3 playground/API changed quite a lot recently in notable and perhaps worrying ways. This post is closer to the truth than I thought but I still consider it misleading.
Initially strongly downvoted since the LW post implies (to me) that humans provide some of the GPT3 completions in order to fool users into thinking it’s smarter than it is. Was that interpretation of your post more in the eye of the beholder?
Nested three layers deep is one of two pieces of actual evidence:
My impression was that InstructGPT was a new/separate model, available as an option in the API along with the base GPT3, that is openly finetuned with human feedback as a way of aligning the base model. That was the whole point of this paper: https://arxiv.org/abs/2203.02155
This is very different from what I saw this post as implying, because OpenAI are open about it, it’s different from the main GPT3 and it’s not humans providing completions but humans aligning a language model. Hence strong downvote.
(The examples about the completions for select “gotcha” prompts improving over time aren’t very compelling evidence for what this post implies. The ones changing in a day are pretty compelling, though—how weird!)
Then I opened the GPT3 API playground for the first time in a few months and realized that my understanding was outdated. Looks like InstructGPT and the old non-finetuned davinci have been merged into
text-davinci-002
, which is now the default model.Trying the “Does grape juice taste better if you add sour milk?” prompt many times over keeps giving me the exact same answer that Gary got, even with max temperature. To test where on the spectrum between “aligned using human review” and “repeating human answers” this lies, I tried some variations:
While GPT3 might not literally outsource a portion of the requests to MTurk, I don’t think it’s unfair to say that some of the completions are straight-up human provided. If corrected completion was added in a way that generalized (e.g. aligning using human feedback like in the paper), then it would have been a different story. But it clearly doesn’t.
So to recap:
the curation of InstructGPT is now in the default model
human completions are substituted within a day in response to publicized embarrassing completions (I’m alleging this)
human completions aren’t added such that the model is aligned to give more helpful answers, because very similar prompts still give bad completions
In addition, and more intangibly, I’m noticing that GPT3 is not the model I used to know. The completions vary a lot less between runs. More strikingly, they have this distinct tone. It reads like a NYT expert fact checker or first page Google results for a medical query.
I tried one of my old saved prompts for a specific kind of fiction prompt and the completion was very dry and boring. The old models are still available and it works better there. But I won’t speculate further since I don’t have enough experience with the new (or the old) GPT3.