Question: if 10 members of the public read an article making a case for AI safety, how many of them would say it is less than 10 years to human-level AI? How many would say it is less than 20 years? How many would say human level AI would appear before 2100? How many would say after 2100? (Assume these are multiple choice questions).
Answer:
70% said within 10 years, 30% said within 20 years.
Context: I recently ran a paid survey on Prolific, asking people to read A Case for AI Safety and give us feedback. People’s timelines surprised me, but not @plex, who won some bayes points off me. Plex thought this could be a neat calibration test for LW.
Fwiw I don’t take Prolific and similar services to be especially reliable ways to get information about this sort of thing. It’s true that they’re among the best low-medium effort ways to get this information, but the hypothetical at the top of this post implies that they’re 1:1 with natural settings, which is false.
That sounds reasonable, but how do you know this? Also, any recommendations for better ways to get this information w/o being more than 2x as costly? (Cost $100 for 10 people, who spent 35 minutes reading and giving feedback).
answer is somewhat complicated and I’m not sure ‘know’ is quite the right bar
contractor verification is a properly hard problem for boring bureaucratic reasons; it’s very hard to know that someone is who they say they are, and it’s very hard to guarantee that you’ll extract the value you’re asking for at scale (‘scalable oversight’ is actually a good model for intuitions here). I have:
1. Been part of surveys for services like the above
2. Been a low-level contractor at various mid-sized startups (incl. OAI in 2020)
3. Managed a team of hundreds of contractors doing tens of thousands of tasks per month (it was really just me and one other person watching them)
4. Thought quite a lot about designing better systems for this (very hard!!!)
5. Noted the lack of especially-convincing client-facing documentation / transparency from e.g. Prolific
The kinds of guarantees I would want here are like “We ourselves verify the identities of contractors to make sure they’re who they say they are. We ourselves include comprehension-testing questions that are formulated to be difficult to cheat alongside every exit survey. etc etc”
Most services they might pay to do things like this are Bad (but they’re B2B and mostly provide a certification/assurance to the end-user, so the companies themselves are not incentivized to make sure they’re good).
Feel free to ask more questions; it’s kind of late and I’m tired; this is the quick-babble version.
EDIT: they’re not useless. They’re just worse than we all wish they’d be. To the best of my knowledge, this was a major motivator for Palisade in putting together their own message testing pipeline (an experience which hasn’t been written about yet because uh… I haven’t gotten to it)
$100 for what I already got. I could pay less, but I am not sure if that would make the signal/noise ratio too low to be worthwhile. Maybe @yams could tell us?
Question: if 10 members of the public read an article making a case for AI safety, how many of them would say it is less than 10 years to human-level AI? How many would say it is less than 20 years? How many would say human level AI would appear before 2100? How many would say after 2100? (Assume these are multiple choice questions).
Answer:
70% said within 10 years, 30% said within 20 years.
Context: I recently ran a paid survey on Prolific, asking people to read A Case for AI Safety and give us feedback. People’s timelines surprised me, but not @plex, who won some bayes points off me. Plex thought this could be a neat calibration test for LW.
Fwiw I don’t take Prolific and similar services to be especially reliable ways to get information about this sort of thing. It’s true that they’re among the best low-medium effort ways to get this information, but the hypothetical at the top of this post implies that they’re 1:1 with natural settings, which is false.
That sounds reasonable, but how do you know this? Also, any recommendations for better ways to get this information w/o being more than 2x as costly? (Cost $100 for 10 people, who spent 35 minutes reading and giving feedback).
answer is somewhat complicated and I’m not sure ‘know’ is quite the right bar
contractor verification is a properly hard problem for boring bureaucratic reasons; it’s very hard to know that someone is who they say they are, and it’s very hard to guarantee that you’ll extract the value you’re asking for at scale (‘scalable oversight’ is actually a good model for intuitions here). I have:
1. Been part of surveys for services like the above
2. Been a low-level contractor at various mid-sized startups (incl. OAI in 2020)
3. Managed a team of hundreds of contractors doing tens of thousands of tasks per month (it was really just me and one other person watching them)
4. Thought quite a lot about designing better systems for this (very hard!!!)
5. Noted the lack of especially-convincing client-facing documentation / transparency from e.g. Prolific
The kinds of guarantees I would want here are like “We ourselves verify the identities of contractors to make sure they’re who they say they are. We ourselves include comprehension-testing questions that are formulated to be difficult to cheat alongside every exit survey. etc etc”
Most services they might pay to do things like this are Bad (but they’re B2B and mostly provide a certification/assurance to the end-user, so the companies themselves are not incentivized to make sure they’re good).
Feel free to ask more questions; it’s kind of late and I’m tired; this is the quick-babble version.
EDIT: they’re not useless. They’re just worse than we all wish they’d be. To the best of my knowledge, this was a major motivator for Palisade in putting together their own message testing pipeline (an experience which hasn’t been written about yet because uh… I haven’t gotten to it)
What’s your sample size?
10
man you should probably get some more I can’t imagine it’ll be that expensive?
$100 for what I already got. I could pay less, but I am not sure if that would make the signal/noise ratio too low to be worthwhile. Maybe @yams could tell us?