Course selection based on instructor

I’ve been trying to compile advice on how to select good instructors in college, assuming that the major and course level are given. Of course, many people don’t have a lot of flexibility with instructor choice, so this advice has limited applicability. The reason I’m posting to LessWrong is to get feedback from interested LessWrong readers regarding some of the factual assertions I make. I’ve marked the parts where I’m most eager for feedback. My experience with instruction (direct and indirect) has been focused largely on mathematics. Thus, it’s possible that some of the points I raise have limited applicability in more discussion-based subjects.

Note also that although my experiences have been partly shaped by the undergraduate teaching I’ve done in four years as a graduate student at the University of Chicago, I’m neither representing the University nor am I critiquing or describing any features specific to the University of Chicago.

I’m working within the conceptual framework that there are three broad types of value that students derive from college courses:

  • Human capital: Students acquire knowledge, skills, and abilities that help them with further courses, jobs, graduate school, or other aspects of their future life. When choosing between different instructors for courses at the same level in the same college, the human capital differential boils down to how different the instructors are in how well they teach the material.

  • Signaling: that they have learned material, again both for future courses, jobs, graduate school, or other aspects of their future life. When choosing between different instructors for courses at the same level in the same college, the signaling differential boils down to the relative ease of getting a good grade (the “ease” could stem both from better learning and easier grading). In addition, some instructors may be willing to write recommendation letters or referral letters for further courses, jobs, graduate school, scholarships, internships, etc.

  • Consumption: Some courses can be fun to consume. The instructors and fellow students could be entertaining. The homeworks could be challenging in a nice way. Just like a good physical workout can be edifying in addition to building muscle or stamina, so can a good educational experience.

Why I think instructor selection matters:

  • Instructors differ significantly from one another in terms of their quality of teaching (human capital + consumption value) and ease of grading (signaling). I’m curious here about the extent to which LessWrong readers think variation between instructors for similar courses compares to variation between institutions. For instance, how much does variation between instructors at Harvard compare with variation between UCLA and Harvard?

  • Apart from the teaching and grading, a good instructor is a candidate to write recommendation letters for future courses, internships and scholarships, and graduate school applications. Additionally, a good instructor may recommend options to students.

  • Students routinely neglect instructor selection or use suboptimal criteria, relative to its importance (is this true?) or relative to other things they fret over, such as college selection. Often, students decide their courses for the next semester or quarter after a meeting of a few hours with an advisor, based on time scheduling constraints. Update: Other students use data such as online evaluations and talking to friends who have taken classes with specific instructors, but may not take proactive steps to collect such data. Further, the criteria that they use to process the data may well be suboptimal (or at any rate, I believe so, which is why I’m writing the post).

For the discussion below, I largely assume that the student is selecting between different sections of the same course, where each section follows a similar curriculum but the instructor has flexibility in terms of the final examinations and grades. What do LessWrong readers think about how common this is relative to a setup where all sections have the same final examinations and grades? However, much of the advice is general.

Pitfalls

  • Students often incorrectly believe that instructors who are harder in terms of the material covered or in terms of the work they assign will be harder in terms of the final letter grades they assign. This is not necessarily true. The overall correlation sign is unclear, but there are teachers in all four quadrants in terms of (hardness of material, hardness of grading).

  • Looking at the difficulty level of examinations in isolation can paint a misleading picture, because instructors may differ somewhat in the techniques they cover, and this can radically affect the perceived difficulty of a question.

  • Remember that student evaluations, although more reliable than peer faculty evaluations or other evaluation methods, are still generally unreliable (the Wikipedia page and Mike Huemer’s article are great starting points). There is still some value you can extract from student evaluations, but not a lot. The main problem with the numerical part of such evaluations is improper anchoring – students aren’t clear on the scale relative to which they are evaluating instructors. As a result, the median numerical rating winds up at around 4.4/​5, leading to censoring at the top (a ceiling effect). Update: Since Falenas108 raised this issue in a comment, I know that the long form responses in student evaluations are somewhat more informative than the numerical ratings, but these suffer from many of the same problems. The Huemer article and the references in the Wikipedia article and the Huemer article are good starting points. Further, optional online evaluations (which are the only ones that are easily available) often suffer from both low response rates and a selection bias in the set of respondents towards people who feel strongly about the instructor.

  • The illusion of transparency and double illusion of transparency (see also here) make it quite hard to evaluate teachers by just sampling their classes. It does provide a start though, and, if done well, can be more informative than student evaluations.

Things to keep in mind from a human capital + signaling (grades) perspective:

  • Criterion-referenced versus norm-referenced grading: Keep in mind the distinction between criterion-referenced grading (assigning grades based on attainment of pre-specified mastery levels) and norm-referenced grading (assigning grades based on relative performance). One extreme of norm-referenced grading is “grading on a curve” where it is pre-specified how many students will get an A, how many will get an A-, etc. There are also other forms of norm-referenced grading (a typical approach many instructors use is to start from the top and “look for a gap” of x or more points for each grade decrement). Note that a lot of students confuse “grading on a curve” with “having lax grading standards” and conflate both of these with “having low numerical cutoffs for a given grade” but the three ideas are all different (particularly in the college course context) and it’s best that you do not conflate them.

  • The significance of measurement error: There are three factors that affect the degree of measurement error in grading that you can easily determine:

  1. The degree of gap between the score requirements for grades. An easy test where a 90 is an A, an 85 is an A-, etc., means that one careless error can affect your grade adversely. A hard test where an 80 is an A, a 60 is an A-, etc., means that the measurement is considerably less likely to be influenced by small measurement error issues.

  2. The number of questions on the test, and the number of tests and other assessment methods used. Generally, the more the number of items used, and the more distinct questions on the test, the more reliable the measurement (this is something to do with the law of large numbers).

  3. Whether or not instructors award partial credit in tests. Partial credit makes measurement more accurate, by indirectly increasing the number of items being measured.

  • Teaching to the test: Instructors differ significantly in the extent to which they teach to the test. Keep in mind that “teaching to the test” although often frowned upon from a human capital perspective is not necessarily bad from either a human capital or a signaling perspective. That’s because teaching to the test can be thought of as “testing what is taught” particularly when the instructor has flexibility in setting the test. Degrees of teaching to the test include:

  1. Providing sample tests to eliminate uncertainty about the test format.

  2. Providing review materials or conducting review sessions that are closely optimized to maximal test performance.

In general, if you are a good student with a reasonable shot at getting a fairly good grade, and you are interested in both human capital and grades, give preference to a criterion-referenced instructor who boasts low measurement error (lots of questions, lots of tests, partial credit, and hard tests) and teaches to the test. Subject to these constraints, favor the instructor whose overall grades are easiest/​best.

If you are not interested in the human capital component of the course much, and are looking purely for a good grade, the choice between criterion-referenced and norm-referenced grading is harder. If choosing a norm-referenced grader, keep in mind the student population you are with. Depending on other courses, the student body during some quarters or semesters can be considerably better than during others, making it relatively harder to get a good grade.

If you are not very good with the material and not keen on deep learning, it might actually pay off to choose an instructor with higher measurement error. This is similar to the idea that people with the odds against them would have the most chance of winning if they made a small number of big bets, whereas people with the odds in their favor would have the most chance of winning by placing a large number of small bets.

Getting the most out of attending sample classes

It’s quite rare for students to attend sample classes with instructors that they plan to study with (there could be many reasonsdo readers have any thoughts?). Update: When talking of “sample classes” I was referring to classes with the same instructor in an earlier term, because it’s often very difficult to change classes once the term has begun, due to complications with scheduling, or classes getting full. The same instructor may be teaching a somewhat different course in the preceding term, so such sampling is useful only in so far as it captures generic aspects of instructors that transfer across courses. There may also be an element of difference between universities with a semester system and a quarter system. Those on the quarter system have less time to shop around between classes because the schedule is more compressed overall.

For those who do choose to attend sample classes, the following tips may be useful.

  • If you attend sample classes, check for how carefully students are listening: When attending a sample class, take a seat close to the back, so you get a bird’s eye view of what all the students are doing. Existing students have more experience than you and are more tuned in to the overall course, so their decision of how much attention to pay to the teacher is an indication of the value generated by the teacher.

  • Judge favorably instructors who engage in cold calling and non-voluntary participation (polling, desk work checking). Ceteris paribus, if shown two equally “interactive” classes, one with cold calling and one without, pick the one with cold calling. In general, pick classes with cold calling. Polling and desk work are also positives, albeit milder ones, and many college-level classes don’t have much time for desk work.

  • Avoid classes where students appear eager to impress the teacher or their fellow students: A dynamic where students are participating largely with the goal of impressing the teacher or their fellow students tends to be unhealthy in promoting student learning.

  • Favor good classroom technique, but don’t be too impressed by slickness due to double illusion of transparency. Good board technique, neat handwriting, etc. are valuable. However, sometimes better instructors actually sound more confusing than not-so-good instructors, because they highlight areas of difficulty, cold call students, etc., making the pain points clearer and avoiding the double illusion of transparency.

Other criteria

The following may be harder to gauge from a single lecture, but can usually be better gathered by talking to students who have studied with the instructor and in some cases by reviewing student evaluations that include long form responses.

  • Favor instructors who make notes and sources clear. This is less of an issue if the instructor is rigidly following a specific course text and the text is good. Insofar as the instructor is covering material not in the text, does the instructor provide notes or sources to read, or expect people to use notes taken in class?

  • Favor frequent “low-stakes” assessment to shatter the double illusion of transparency. Instructors who care about learning will tend to be more likely to use frequent low-stakes assessment such as class quizzes. These indicate that the students and instructor generally have a clear idea at any given stage of how well the material is being understood. However, beware of the difficulty level of these. Very easy assessments and very difficult assessments should be discounted (both of these have their uses, but they don’t shatter the double illusion of transparency). Although the specifics vary, low-stakes assessments are best when the average score is somewhere between 30% and 70%.

  • When looking at student evaluations, discount evaluations that make it appear like the instructor “made everything very easy” – this is likely a double illusion of transparency or a student who had an unusually strong background. Many concepts being taught are hard. While it’s possible to obfuscate them with difficult instruction, it’s not possible to make them “very easy.” It’s quite likely that instructors who give their students this impression are not explaining the concepts fully, or are testing the students in a sufficiently superficial manner that the students think that what they got just by hearing the material is a full understanding. It’s also possible that some student was unusually smart or had an unusually strong background and committed a fundamental attribution error by attributing it to the instructor. Of course, good instructors do end up teaching their students better and perhaps even making (some of) the concepts completely clear, but rarely by making everything seem “very easy” – there is usually some sort of effort and pain undergone by the students to acquire that understanding.

  • When looking at student evaluations or getting word-of-mouth feedback, weigh more heavily certain kinds of evaluations that overcome the problem of bad anchoring and adjustment. Evaluations by people who have switched between multiple sections of the same course are often most illuminative, because they can control for the difficulty level of the material to quite an extent. Evaluations by people who have taken follow-on courses that rely on the material more are also useful because these people have more of an idea of whether what they learned was actually used, and can judge the quality of learning in that context.

  • Weigh positively (but don’t be overimpressed by) student recalls of “aha” moments and phrases that indicate that students felt that the learning in their class was at a higher plane than in the others. “Aha” moments and unexpected connections emerging between seeds sown earlier in the course and material covered recently are indicative of unusually good teachers. However, keep in mind that given the huge subject matter knowledge gap between the teacher and the student, it is relatively easy for students to have false “aha” moments for insights that are actually pretty mundane and ones that they should have got at the outset.

  • (This one may be asking too much, or may be too idiosyncratic): Ask students to recall their test preparation experience. This can often be insightful regarding the kind of methods and level of learning the instructor has encouraged. Note that students differ, so this should be considered only averaged across students or for students for whom you know their general pattern of test preparation. Good instructors will try to maximize the extent to which test preparation is used to reinforce the material, rather than simply treating tests as a method to assess students. If instructors put up review materials or conduct review sessions, ask about the format of these. Negatively judge review sessions that are just somewhat tweaked versions of lectures. Look for review sessions that are “controlled, desk-work-based, and interactive” – controlled in the sense that the instructor controls the flow of the session. The “desk-work-based” says that the review session should largely involve desk work by all rather than lecturing or board work by select individuals. The “interactive” component stresses interaction between the instructor and students. In addition to the review sessions, look for review materials, and for students’ subjective experience when reviewing. Did they have “aha” moments at review time? (Note that desk-work becomes more important at review time than cold-calling, because students are required to be able to work things out in full detail rather than just answer isolated questions).

Looking for thoughts

What do readers think of the lists I’ve offered above? Are there items you disagree with? Things you think I missed? Arguments that the entire question is ill-considered? All feedback would be very much appreciated.

PS: I’ve deliberately omitted other factors, such as scheduling and space constraints and peer choices, from the discussion. In many cases, the fact that a close friend or study buddy is taking a particular section is good reason to take that section. Similarly, scheduling and space constraints can be binding at times. I don’t think that these meaningfully alter the shape of the preceding advice, but they do constrain the scope within which it can be applied.