In my experience, startups want to demonstrate efficacy about basic things: weight removed, increased revenue, personal productivity, product safety, etc.
This kind of research lends itself extremely well to protocol templates: a standardized sequence of steps to locate the participants, collect the data, and decide the results. These steps could be performed by a website. I’ve posted a story of how that might work here.
Without such a project, founders have two options:
Perform the study themselves. The scientific background and time required to design a study well is non-trivial, as is the expertise necessary to create participant waivers. There are also significant time costs in setting up the data collection, and performing the analysis. At the end, the study will be greeted somewhat skeptically.
Pay a University or Contract Research Organization to perform the study. This is expensive, which I believe is why more founders aren’t doing it.
This project creates a third option:
Use a standard template then get back to work.
Which may be imperfect, but is still a pretty appealing value proposition.
This kind of research lends itself extremely well to protocol templates
This is not at all self-evident to me. How, for example, would you demonstrate product safety (for a diverse variety of products) via a standard template?
Use a standard template then get back to work.
I don’t see how a “template” frees one from
significant time costs in setting up the data collection, and performing the analysis.
Research costs money and requires competent people. If it were possible to do meaningful research on the cheap just by reusing the same template, don’t you think it would be a very popular opinion already?
This is not at all self-evident to me. How, for example, would you demonstrate product safety (for a diverse variety of products) via a standard template?
Templates not template. I think if you know roughly which bodily systems a product is likely to effect, the questions are not so diverse.
My background is not in question selection (it’s ML and webapp programming), but here goes some general question ideas for edible products:
I have/have not felt sick to my stomach in the last 24 hours. (standardized 1-7 to rate severity)
I have/have not felt dizzy in the last 24 hours. (standardized 1-7 to rate severity)
The mandatory questions are intended to give LessWrong / everyone a say in what startups will test their products for—NOT to provide a 100% guarantee of general safety (the FDA already handles that). We should use these questions to learn about unanticipated side effects.
Research costs money and requires competent people. If it were possible to do meaningful research on the cheap just by reusing the same template, don’t you think it would be a very popular opinion already?
I’m hope it will do something akin to what Google Translate did for translation: lower the cost for modest use cases. If you want a high quality translation (poetry) you still need to hire a good translator. However, if you are willing to accept a reasonably good level of translation quality, it’s now free.
I agree it’s weird that somebody else hasn’t noticed. testifiable.com is the closest I’ve found. I’ve already spoken with Testifiable founder’s and invited them to this thread.
I’m hope it will do something akin to what Google Translate did for translation
There is a critical difference: Google Translate does not guarantee the quality of results and, in fact, often generates something close to garbage. It may produce a “reasonably good level of translation quality” or it may not and that’s fine because it made zero promises about its capabilities.
You are planning to set yourself up as a standard of research which means you must generate better than adequate results every single time.
P.S. Oh, and a random thought. What do you think 4Chan will do with your “webapp”? X-)
Hmm. I’m confused. Let’s look at something slightly more extreme than what we’re talking about and see if that helps.
Level 0: Imagine we make a product study as good as possible, then allows anyone to perform the same study with a different product. Some products “shouldn’t” be tested that way, but I don’t see how a protocol like that will produce garbage (they will merely establish “no effect”).
Level 1: We broaden to support more companies, and allow anyone to perform those studies as well.
Level 2: After a sufficient number of companies have had their experiment created, we take the protocols and create a “build your own experiment constructor kit” which allows for an increasingly large number of products to receive a good test.
Level 3: As we add more and more products to the adaptable system, we reach the point where most product claims have a community ratified question for them, and the protocol is stable. You might not be able to test 20% of the things that you’d like, but for the other 80% you can test those just fine.
Please let me know where you believe that plan breaks. Actual plan will likely differ of course, but we need something concrete to talk about or I’m going to keep not understanding what sounds like potentially constructive methodology advice.
P.S. Oh, and a random thought. What do you think 4Chan will do with your “webapp”? X-)
Perform some hilarious experiments! Hopefully we get publicity from them :D
Well, I still don’t understand how it’s supposed to work.
Let’s take a specific example. For example there is a kitchen gadget, Sansaire sous vide circulator, that was successfully funded on Kickstarter. Let’s pretend that I’m one of the people behind the project and I’m interested in (1) whether people find that product useful; (2) whether the product is safe.
How would the product study go in this particular example? Who does what in which sequence?
I described an overview in a different thread, but that was before a lot of discussion happened.
I’ll use this as an opportunity to update the script based on what has been discussed. This is of course still non-final.
The creator of the sous vide machine (Tester) would register his study and agree to the terms.
The Tester would register this as a food-related study, automatically adding required safety questions.
The Tester would perform a search of our questions database and locate customer satisfaction related questions.
The Tester would click “start the experiment”.
Our application would post an ad seeking participants.
The participants would register for the study, once a critical mass was reached our app would create a new instance of the data collection webapp.
Once the study period is complete, the data collection app signs and transfers the participant data back to our main app. The analysis is performed, the study is posted publicly, and the Tester is notified of the results via email.
Okay, hope (and the link to my earlier user story) helps make things more clear. If you see issues with this please do bring it up—finding and fixing issues early is the reason I started this thread.
You’re describing just the scaffolding, but what actually happens? All the important stuff is between points 6 and 7 and you don’t even allocate space for it :-/
The link I gave to the data collection webapp describes the data collection more depth, which I believe is what you are asking about between 6 and 7.
From that url:
Core function:
Every day an SMS/email is sent to participants with a securely generated one time URL.
The participant visits this URL and is greeted with a list of questions to answer.
Potential changes to this story:
If the URL is not used within 16 hours, it expires forever.
If a participant does not enter data for more than 3 days, they are automatically removed from the study.
If a participant feels that they need to removed from the study, they may do so at any time. They will be prompted to provide details on their reasons for doing so. These reasons will be communicated to the study organizer.
The study organizer may halt the study for logistical or ethical reasons at any time
No, not really. Recall the setting—I am about to produce a sous vide circulator and am interested (1) whether people find that product useful; and (2) whether the product is safe. I see nothing in your post which indicates how the process of answering my questions will work.
By the way, shipping a product to random people and asking them “Is it useful?” and “Did you kill yourself at any point during the last 24 hours?” is not likely to produce anything useful at all, never mind a proper scientific study.
“Did you kill yourself at any point during the last 24 hours?” is not likely to produce anything useful at all.
I see. Right now the system doesn’t have any defined questions. I believe that suitable questions will be found so I’m focusing on the areas I have a solid background in.
If a project is unsafe in a literal way, shipping the product to consumers (or offering it for sale) is of course illegal. However, when considering a sous vide cooker in the past I have always worried about the dangers of potentially eating undercooked food (eg. diarrhea, nausea, and light headedness), which was how I took your meaning previously. “Product is safe for use, but accidental use might lead to undesirable outcomes”. As I mentioned in our discussion here this project is not intended to be a replacement for the FDA.
shipping a product to random people and asking them “Is it useful?” … is not likely to produce anything useful
I agree that “is it useful” is not a particularly useful question to ask, but I don’t see any harm in supporting it. If you are looking for a better question, “80% of users used the product twice a week or more three months after receiving it” sounds like information that would personally help me make a buying decision. (Have you used the product today?)
So perhaps frequency of use might be a better question? I wasn’t haggling over what questions to ask because it was your example.
never mind a proper scientific study
I think rigor in data collection and data processing are what make something scientific. For an example, you could do a rigorous study on “do you think the word turtle is funny?”.
Sorry, I don’t find this idea, at least in its present form, useful. However I’ve certainly been wrong before and will be wrong in the future so it’s quite possible I”m wrong now as well :-) There doesn’t seem to be much point for me to play the “Yes, but” game and I’ll just tap out.
I’m hope it will do something akin to what Google Translate did for translation: lower the cost for modest use cases. If you want a high quality translation (poetry) you still need to hire a good translator. However, if you are willing to accept a reasonably good level of translation quality, it’s now free.
I think you overrate the quality of Google Translate. That pitch doesn’t sound right to me.
I think if you change the price of something by an order of magnitude you get a fundamental change in what it’s used for. The examples that jump to mind are letters → email, hand copied parchment → printing press → blogs, and SpaceX. If you increase the quality at the same time you (at least sometimes) get a mini-revolution.
I think a better example might be online courses. It can be annoying that you can’t ask the professor any questions (customize the experience), but they are still vastly better than nothing.
Email is not only cheaper than letters but also much faster.
The online courses example sounds reasonable but I’m still not sure whether that’s the best marketing strategy. Having a seal for following good science processes like preregistration might have it’s own value.
In my experience, startups want to demonstrate efficacy about basic things: weight removed, increased revenue, personal productivity, product safety, etc.
This kind of research lends itself extremely well to protocol templates: a standardized sequence of steps to locate the participants, collect the data, and decide the results. These steps could be performed by a website. I’ve posted a story of how that might work here.
Without such a project, founders have two options:
Perform the study themselves. The scientific background and time required to design a study well is non-trivial, as is the expertise necessary to create participant waivers. There are also significant time costs in setting up the data collection, and performing the analysis. At the end, the study will be greeted somewhat skeptically.
Pay a University or Contract Research Organization to perform the study. This is expensive, which I believe is why more founders aren’t doing it.
This project creates a third option:
Use a standard template then get back to work.
Which may be imperfect, but is still a pretty appealing value proposition.
This is not at all self-evident to me. How, for example, would you demonstrate product safety (for a diverse variety of products) via a standard template?
I don’t see how a “template” frees one from
Research costs money and requires competent people. If it were possible to do meaningful research on the cheap just by reusing the same template, don’t you think it would be a very popular opinion already?
Templates not template. I think if you know roughly which bodily systems a product is likely to effect, the questions are not so diverse.
My background is not in question selection (it’s ML and webapp programming), but here goes some general question ideas for edible products:
I have/have not felt sick to my stomach in the last 24 hours. (standardized 1-7 to rate severity)
I have/have not felt dizzy in the last 24 hours. (standardized 1-7 to rate severity)
Bristol stool scale score
The mandatory questions are intended to give LessWrong / everyone a say in what startups will test their products for—NOT to provide a 100% guarantee of general safety (the FDA already handles that). We should use these questions to learn about unanticipated side effects.
I’m hope it will do something akin to what Google Translate did for translation: lower the cost for modest use cases. If you want a high quality translation (poetry) you still need to hire a good translator. However, if you are willing to accept a reasonably good level of translation quality, it’s now free.
I agree it’s weird that somebody else hasn’t noticed. testifiable.com is the closest I’ve found. I’ve already spoken with Testifiable founder’s and invited them to this thread.
There is a critical difference: Google Translate does not guarantee the quality of results and, in fact, often generates something close to garbage. It may produce a “reasonably good level of translation quality” or it may not and that’s fine because it made zero promises about its capabilities.
You are planning to set yourself up as a standard of research which means you must generate better than adequate results every single time.
P.S. Oh, and a random thought. What do you think 4Chan will do with your “webapp”? X-)
Hmm. I’m confused. Let’s look at something slightly more extreme than what we’re talking about and see if that helps.
Level 0: Imagine we make a product study as good as possible, then allows anyone to perform the same study with a different product. Some products “shouldn’t” be tested that way, but I don’t see how a protocol like that will produce garbage (they will merely establish “no effect”).
Level 1: We broaden to support more companies, and allow anyone to perform those studies as well.
Level 2: After a sufficient number of companies have had their experiment created, we take the protocols and create a “build your own experiment constructor kit” which allows for an increasingly large number of products to receive a good test.
Level 3: As we add more and more products to the adaptable system, we reach the point where most product claims have a community ratified question for them, and the protocol is stable. You might not be able to test 20% of the things that you’d like, but for the other 80% you can test those just fine.
Please let me know where you believe that plan breaks. Actual plan will likely differ of course, but we need something concrete to talk about or I’m going to keep not understanding what sounds like potentially constructive methodology advice.
Perform some hilarious experiments! Hopefully we get publicity from them :D
Well, I still don’t understand how it’s supposed to work.
Let’s take a specific example. For example there is a kitchen gadget, Sansaire sous vide circulator, that was successfully funded on Kickstarter. Let’s pretend that I’m one of the people behind the project and I’m interested in (1) whether people find that product useful; (2) whether the product is safe.
How would the product study go in this particular example? Who does what in which sequence?
I described an overview in a different thread, but that was before a lot of discussion happened.
I’ll use this as an opportunity to update the script based on what has been discussed. This is of course still non-final.
The creator of the sous vide machine (Tester) would register his study and agree to the terms.
The Tester would register this as a food-related study, automatically adding required safety questions.
The Tester would perform a search of our questions database and locate customer satisfaction related questions.
The Tester would click “start the experiment”.
Our application would post an ad seeking participants.
The participants would register for the study, once a critical mass was reached our app would create a new instance of the data collection webapp.
Once the study period is complete, the data collection app signs and transfers the participant data back to our main app. The analysis is performed, the study is posted publicly, and the Tester is notified of the results via email.
Okay, hope (and the link to my earlier user story) helps make things more clear. If you see issues with this please do bring it up—finding and fixing issues early is the reason I started this thread.
You’re describing just the scaffolding, but what actually happens? All the important stuff is between points 6 and 7 and you don’t even allocate space for it :-/
The link I gave to the data collection webapp describes the data collection more depth, which I believe is what you are asking about between 6 and 7.
From that url:
Core function:
Every day an SMS/email is sent to participants with a securely generated one time URL.
The participant visits this URL and is greeted with a list of questions to answer.
Potential changes to this story:
If the URL is not used within 16 hours, it expires forever.
If a participant does not enter data for more than 3 days, they are automatically removed from the study.
If a participant feels that they need to removed from the study, they may do so at any time. They will be prompted to provide details on their reasons for doing so. These reasons will be communicated to the study organizer.
The study organizer may halt the study for logistical or ethical reasons at any time
No, not really. Recall the setting—I am about to produce a sous vide circulator and am interested (1) whether people find that product useful; and (2) whether the product is safe. I see nothing in your post which indicates how the process of answering my questions will work.
By the way, shipping a product to random people and asking them “Is it useful?” and “Did you kill yourself at any point during the last 24 hours?” is not likely to produce anything useful at all, never mind a proper scientific study.
I see. Right now the system doesn’t have any defined questions. I believe that suitable questions will be found so I’m focusing on the areas I have a solid background in.
If a project is unsafe in a literal way, shipping the product to consumers (or offering it for sale) is of course illegal. However, when considering a sous vide cooker in the past I have always worried about the dangers of potentially eating undercooked food (eg. diarrhea, nausea, and light headedness), which was how I took your meaning previously. “Product is safe for use, but accidental use might lead to undesirable outcomes”. As I mentioned in our discussion here this project is not intended to be a replacement for the FDA.
I agree that “is it useful” is not a particularly useful question to ask, but I don’t see any harm in supporting it. If you are looking for a better question, “80% of users used the product twice a week or more three months after receiving it” sounds like information that would personally help me make a buying decision. (Have you used the product today?)
So perhaps frequency of use might be a better question? I wasn’t haggling over what questions to ask because it was your example.
I think rigor in data collection and data processing are what make something scientific. For an example, you could do a rigorous study on “do you think the word turtle is funny?”.
Sorry, I don’t find this idea, at least in its present form, useful. However I’ve certainly been wrong before and will be wrong in the future so it’s quite possible I”m wrong now as well :-) There doesn’t seem to be much point for me to play the “Yes, but” game and I’ll just tap out.
I think you overrate the quality of Google Translate. That pitch doesn’t sound right to me.
Ahh, okay. That one goes on the scrap heap.
I think if you change the price of something by an order of magnitude you get a fundamental change in what it’s used for. The examples that jump to mind are letters → email, hand copied parchment → printing press → blogs, and SpaceX. If you increase the quality at the same time you (at least sometimes) get a mini-revolution.
I think a better example might be online courses. It can be annoying that you can’t ask the professor any questions (customize the experience), but they are still vastly better than nothing.
Another example is the use of steel. If it’s expensive, it’s used for needles and watch springs. If it’s cheap, it’s used for girders.
Email is not only cheaper than letters but also much faster.
The online courses example sounds reasonable but I’m still not sure whether that’s the best marketing strategy. Having a seal for following good science processes like preregistration might have it’s own value.