Emiya comments on 3 Levels of Rationality Verification

Emiya 12 Dec 2020 17:30 UTC
3 points
Something the masters (and students) of each school can do to keep it real:
The Winning Tournament: Organise a yearly or so event. A group of clever, evil people select and creates a number of “games” or tests, if you’d rather. Wannabe masters of rationality can compete against each other for the title, pride and glory.
The type of games and tests should be kept varied. Some could be contests where participants randomly compete against each other, other might be battle royals where people can form alliance and all around try as hard as they can to win. Others are just games where there is a correct solution to be figured out from clues, the fastest you get there the more points you gain, but if you get to the wrong solution, well, that’s really worse than having been slow.
Physical abilities and expertise in fields should be kept out of these games if possible, since they are powerful “noise” sources.
The tests should be really hard, so a committee of clever, evil people who approach the task of creating the games with a certain degree of evil glee is recommended. They can get inspirations from real life problems that have caused disasters and routinely mess up experts are recommended. Games that are similar to real problems people can meet in real life are recommended (for example, having to mediate an agreement between other groups of participants, or just by hearing the description of a situation).
A good example was the Darwing Game here on Lesswrong, though it seriously advantaged programmers it had the effect to spark a lot of interesting plans.
For organising it… if you keep things “fun” enough it should be doable to find interested people. If you keep things “amazing” enough, rationality might even start to get a bit of “awesomeness” reputation from it. It seems possible to even organise it online. You could charge a small fee to participate it to cover expenses (something proportionated to the “fun” you’d get participating) and perhaps use a part of it as prize money for the winner.
A group of intelligent people acts as “graders” of the game, to create an individual score that hopefully could be more or less coherent between tournaments, so you’d know how much you’ve improved from the last one.
Clearly, this is just as good as the games and tests that are put into it. Crucial aspects would be how much does a game has a victory that correlates with rationality skill rather than just some more specific skills, how much does a game has a victory that correlates with being able to “win” in common real life problems. But I think the committee could have fun in creating said games, and I’d expect people skilled in thinking and rationality to be advantaged. Raw native intelligence would be a major source of noise, but that can’t really be helped I think.
Something you could do to test a hundred student
A number of “games” or “situations” where a mathematical solution is calculated is devised, and such games are handed in a yearly test people can do online.
The betting game with the lights in “A Technical Explanation of a Technical Explanation” is a good example. People are asked to bet each round, with evidence coming from a mathematical simulation of the game being provided bit by bit. More than a scoring function can be used for such betting games, so players also have to calculate what the best winning strategy is depending on the circumstance.
Deduction games can also be played, where you have to use the clues (again, we are supposing types of games where we can calculate the exact or most likely solution for the evidence provided)
For such games, elaborated rules can be devised, not necessarily going with simple random distributions. Perhaps some scoring system for correctly stating the rule could be used? Occam’s Razor might apply to how likely you are to have a complex rule vs. a simple rule.
Other possible games can be devised in decision theory, with the task of maximising a given utility function, predicting the moves of other entities is incorporated as well. For extra fun, some of these games will be connected, if a certain number of people will be asked to choose the move of the entity A in a scenario, and a number of player will be asked to choose the move of the entity B in the same scenario, then the percentages of the players’ decisions will be used to choose A and B moves, evaluating the final score each player got in that scenario depending on his individual decision.
Arguments with fallacies, biases or other errors of cognition can be provided (not in every test there will be one, and participants won’t know how many there are). Participants would choose the correct option from a long list of options, and will have to also mark the numerical number of the line where the fallacy appears to specify which is what, so the scoring system can be automatised.
We can also put in some hard logic exercises for good measure.
Participants receive a global score and more detailed ones, which are determined using time and precision.
Note: I think this kind of test measures more the “theoretical skill” of the participant, and not whether they can apply it in real life.
Something you could use as a test even if people have an incentive to game it
Class project: a group of rationalists choose an ongoing problem in a particular field. A group of experts in the field is selected to act as judges.
A number of rationalists offers to collectively tackle the issue. When they sign up for the game they specify the number of hours they could put into this (a minimum number is required for participation). Applicants are required to specify why they think they could help solving that particular problem, those that aren’t judged suited to the task might be refused by the judges.
Ways to communicate with each other are provided. The participants have to organise the work between themselves, and be able to propose a solution, analysis, or something that would provide progress in said field. Each participants or group of participants will be assigned a task based on the number of hours they offered to put in, if they can’t complete their tasks this affect negatively their score (which also depends on the number of hours). Of course groups are expected to be able to take care of such problems by themselves.
When the final work is produced (which has to suit standard requirements for their fields) it is graded by the judges, to give a “quick score”.
More importantly, attempts are made to put the work under the scrutiny of the fields it belongs to (for example, by publishing it in a scientific journal). If this scrutiny goes well, it goes into the “real score”. Progress made in solving said problem are monitored. If the work produced by the rationalist group ends up being right, it goes into the “real score”. If the project leads to solving the problem, it super goes into the “real score” and badges or something are handed out to everyone who worked on it and received an adequate score.