answer is somewhat complicated and I’m not sure ‘know’ is quite the right bar
contractor verification is a properly hard problem for boring bureaucratic reasons; it’s very hard to know that someone is who they say they are, and it’s very hard to guarantee that you’ll extract the value you’re asking for at scale (‘scalable oversight’ is actually a good model for intuitions here). I have:
1. Been part of surveys for services like the above
2. Been a low-level contractor at various mid-sized startups (incl. OAI in 2020)
3. Managed a team of hundreds of contractors doing tens of thousands of tasks per month (it was really just me and one other person watching them)
4. Thought quite a lot about designing better systems for this (very hard!!!)
5. Noted the lack of especially-convincing client-facing documentation / transparency from e.g. Prolific
The kinds of guarantees I would want here are like “We ourselves verify the identities of contractors to make sure they’re who they say they are. We ourselves include comprehension-testing questions that are formulated to be difficult to cheat alongside every exit survey. etc etc”
Most services they might pay to do things like this are Bad (but they’re B2B and mostly provide a certification/assurance to the end-user, so the companies themselves are not incentivized to make sure they’re good).
Feel free to ask more questions; it’s kind of late and I’m tired; this is the quick-babble version.
EDIT: they’re not useless. They’re just worse than we all wish they’d be. To the best of my knowledge, this was a major motivator for Palisade in putting together their own message testing pipeline (an experience which hasn’t been written about yet because uh… I haven’t gotten to it)
answer is somewhat complicated and I’m not sure ‘know’ is quite the right bar
contractor verification is a properly hard problem for boring bureaucratic reasons; it’s very hard to know that someone is who they say they are, and it’s very hard to guarantee that you’ll extract the value you’re asking for at scale (‘scalable oversight’ is actually a good model for intuitions here). I have:
1. Been part of surveys for services like the above
2. Been a low-level contractor at various mid-sized startups (incl. OAI in 2020)
3. Managed a team of hundreds of contractors doing tens of thousands of tasks per month (it was really just me and one other person watching them)
4. Thought quite a lot about designing better systems for this (very hard!!!)
5. Noted the lack of especially-convincing client-facing documentation / transparency from e.g. Prolific
The kinds of guarantees I would want here are like “We ourselves verify the identities of contractors to make sure they’re who they say they are. We ourselves include comprehension-testing questions that are formulated to be difficult to cheat alongside every exit survey. etc etc”
Most services they might pay to do things like this are Bad (but they’re B2B and mostly provide a certification/assurance to the end-user, so the companies themselves are not incentivized to make sure they’re good).
Feel free to ask more questions; it’s kind of late and I’m tired; this is the quick-babble version.
EDIT: they’re not useless. They’re just worse than we all wish they’d be. To the best of my knowledge, this was a major motivator for Palisade in putting together their own message testing pipeline (an experience which hasn’t been written about yet because uh… I haven’t gotten to it)