In the 2012 paper “How To Grade a Test Without Knowing the Answers — A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing” (PDF), Bachrach et al describe a mathematical model for starting with
a set of questions without answers
a set of answerers (participants) of unknown quality
and ending up with assessments of:
correct answers for the questions
assessed difficulty for the questions
evaluations of the participants, including overall reliability and areas of expertise
Can the model actually do this? Is it the best model for doing this? What kind of problems does it handle well and poorly?
One example of the ability of the model: in the paper, the model is run on 120 responses to a quiz consisting of 60 Raven’s Progressive Matrices questions, each question with 8 possible answers. As it happens, no responder got more than 50 questions right. The model correctly inferred the answers to 46 of the questions.
A key assumption in the model is that errors are random: so, in domains where you’re only asking a small number of questions, and for most questions a priori you have reason to expect some wrong answers to be more common than the right one (e.g. “What’s the capital of Canada/Australia/New Zealand”), I think this model would not work (although if there were enough other questions such that good estimates of responder ability could be made, that could ameliorate the problem). If I wanted to learn more, I would read this 2016 review paper of the general field.
That being said, as long as you know which answers will be more common among people who don’t know the right answer, and roughly how much more common they will be, you can probably add that knowledge to the algorithm without too much difficulty and have it still work as long as there are some questions where you don’t expect the uninformed to reliably lean one way.