Unofficial ESPR Post-mortem

[Disclaimer: This post reflects my, i.e. Owen Shen’s, personal opinions. It does NOT reflect CFAR or ESPR’s opinions, and should NOT be taken as an ESPR-endorsed communication.]

[An overview of some of the camp and project dynamics of ESPR 2017. I look at diversity causing certain problems, difficulties evaluating participants, ways to improve camp experience, and organizational difficulties.]

Introduction:

In August of 2017, I went off to London for the European Summer Program on Rationality (ESPR). This was supposed to be the place where all our efforts over the past 8 months came together, and we made The Best Summer Camp Ever.

ESPR is a two week summer program for highly talented students 16-19 across the world. We had talks/​classes on machine learning, cognitive psychology, effective altruism, cryptocurrencies, and salsa dancing. (Note that everything except for the dancing may not be representative.)

I was a participant of ESPR 2016; last year, it was called EuroSPARC. The camp name and spirit is modeled off the US-based SPARC camp. For ESPR 2017, I ended up being responsible for (part of) admissions, communications, assorted pre-camp logistics, and camp counselor duties.

This post is structured as a sort of free-form analysis, where I examine a section of ESPR, look at how it worked in practice, and then move on to another section. There are some common threads between these sections, but there’s not really a central thesis; there’s just a series of lessons that I learned.

Also, like most post-mortems, any analysis is undoubtedly going to focus on the Bad and the Sub-optimal. To provide some early counterbalance to the ensuing dive into things which ended up not working, here’s a quick overview of what went well:

  • 72% of our participants rated us 910 or 1010 when asked how glad they were that they attended.

  • Unexpected setbacks which could have critically ended ESPR were miraculously solved by heroic staff in the nick of time.

  • Multiple students took initiative during camp and taught their own classes, grabbed us a guest speaker, and other exciting things. (See the section on Improving Camp Experience for more details.)

Thus, ESPR had many encouraging signs that things went well.

Of course, a string of things also went wrong. Volunteers left, interpersonal conflicts were raised, and it’s not clear that our participants got much in the way of concrete takeaways.

I hope that looking at all these different pieces of the project can shed some light on why some of the bad things happened, and how we can do better for similar projects in the future.

tldr;

Here’s an overview of the points that are covered below in the following paragraphs:

  1. This year’s diverse curriculum arguably decreased the value of takeaways for the participants.

  2. A vague mission statement meant that different ESPR staff had sometimes conflicting goals which weren’t well resolved.

  3. Evaluating what participants got out of ESPR is a very hard task because there are lots of relevant factors.

  4. One strong way to improve the ESPR experience for participants is to bring in more opportunities for independent projects and ownership of learning.

  5. Unclear role and responsibility specification led to negative incentives for people to take initiative on tasks.

Initial Expectations and Values:

I’m going to be evaluating different parts of ESPR, so it seems crucial to give an outline for which things I cared about and found relevant.

First and foremost, I had originally wanted ESPR to be able to churn out people who’d be motivated to work on problems in the rationality space and effective altruism space.

(This has now actually changed; see the Evaluating Participant Takeaways section for more details.)

A big part of this was that, as an ESPR 2016 alumnus, I had found the camp to be important in roping me in deeper to these ideas, which I consider valuable.

The actual effects of my slant on camp was tempered by others, who had different views, giving pushback. (More on this idea in the Diversity and Dilution section.)

But I just want to set the framing that, compared to other staff on the ESPR team, I was probably one of the ones who leaned the most in the direction of rationality /​ EA.

Diversity and Dilution:

As I mentioned above, there was some strong pushback by others to make ESPR less directly about just rationality and effective altruism. The end result meant that the camp ended up being quite different than just “CFAR, but for super smart high school students”.

And I don’t think this was bad. In the end, I agreed with much of the concerns raised with the pushback, and I’ve also updated my thoughts on what exactly ESPR should be trying to give students. (More on this in the next section Evaluating Participant Takeaways.)

Still, something that I don’t think I fully appreciated at the time was that diversity has costs because the camp’s focus is a zero-sum resource. Any time you introduce a wider range of topics to cover, it’s conceptually less clear for the students what the “important things” are.

(I’m mainly talking about curriculum here; it’s not clear that it generalizes to other activities, given that “classes” and “not-class-things” already have a clear conceptual boundary.)

A phrase that was often repeated at camp was, “I have no idea how I’m going to explain ESPR to my friends and family when I get back home. There’s been so much that happened.”

This sentiment, to me, seems to maybe be a bad sign. There are definite reasons to want a camp to be set up like an experience that travels across many domains in a way that makes it hard to easily articulate, but this also makes it harder to track what happened.

For me, I think it boils down to the question of whether or not the takeaways are stronger when people need to put in effort to explicate them (as would be the case if the initial feel was “hard to explain”) or when they are more clearly chunked (as would be the case if clearer demarcations and perhaps less subjects were covered).

I think one of the stronger benefits for having a more diverse curriculum is that students are able to pick and choose the takeaways they want. If you cover lots of things, then it’s more likely you’ll hit upon something that resonates with them.

Yet, while this does seem like a good thing that could have happened, we also didn’t stress that this sort “take what you need” was the appropriate attitude during camp. Overall, I think the end result was that a lack of a good structure for students to organize what they learned.

Another related barrier for takeaways was ESPR’s focus on bringing together students from a mix of countries; ESPR 2017 had participants from a greater range of countries than ESPR 2016. This led to more difficulties in communication than last year, and I think we didn’t account for this heavily enough in the initial planning.

Cultural expectations within the classroom (EX: how the “teacher” and student dynamic played out) and other social norms felt like they contributed to the wide spread of attitudes and ideas, at least initially.

Of course, one thing I also heard a lot of people say at ESPR was, “One of the best things here is meeting people from lots of different cultures and seeing how things are in other places.”

The obvious answer to the above is “Find the sweet spot between having a mix of cultures and cultural commonalities.” We couldn’t just optimize directly, though, as lots of staff had lots of opinions which differed.

In an effort to bring in people from lots of places, we launched ESPR with a fairly vague mission statement. This meant that we had people under our banner who held conflicting implicit goals, despite the fact that, outwardly, we all supported the overt, non-controversial mission statement.

For example, I very much wanted our participants to get more into effective altruism and rationality. Other people wanted ESPR to be a more playful experience, where students could explore a variety new ideas (and not be shoved face-first into certain ones).

One problem this led to was happening discussions about what sorts of values we should have during camp, rather than before. I think that, had we been more explicit about our goals upfront, we could have worked around some of these problems.

Specifically, we could have brokered compromises early on to ensure that people knew what they were getting in return, and we knew what everyone actually wanted (and that a discussion was happening on how they could get it).

Failing to account for this when bringing on a diverse group of people meant that we had to regress to the broad mission statement (once again diluting things), and were less able to satisfy many of the staff’s specific preferences, or come to acceptable agreements for both sides.

Evaluating Participant Takeaways:

The whole reason we put in effort to make ESPR 2017 happen was because of the participants. Though everyone on the staff might have had diverging agendas, we all wanted something to happen to the participants as a result.

Evaluating ESPR’s impact, however, is a difficult process. Group dynamics are complicated to model, there are lots of confounding factors, and getting clear indication of participant growth (let alone counterfactual growth!) is hard.

With all that in mind, I’ll try to dive a little into what my thoughts are on exactly what the effects were.

First off, I think it seems reasonable that the three largest factors in influencing the experience of camp for the participants were:

  1. The curriculum (EX: What topics were covered.)

  2. Additional activities which encourage ownership (EX: Getting students to do their own independent projects.)

  3. The admissions process (EX: Who we took in.)

From a combination of survey data and first-hand impressions, I think that about 67% (~20) of the participants came away with some kind of takeaway. In terms of EA/​rationality orienting, though, it seems that about 17% of the students (~6) got a lot out of ESPR.

It’s not clear, though, that these are the base rates of “what percentage of people who go to ESPR come out awesome?” There are several confounding factors here.

One of the biggest ones is that most of the people who seemed to pan out very well (i.e. were good along the EA/​rationalist axes) were also people who we knew were already exposed to these ideas prior to camp. This also seemed to be true for last year, and I count myself as one of them.

So, back to the overall goal of ESPR, there’s a crucial question of “rope newcomers into the community” versus “marshall existing young community members” which I think we partially sidestepped early on.

Part of this had to do with ESPR’s focus on high achieving participants, for example those who had succeeded in national and international math competitions. Here’s one version of the story for why we focusing on these students from a consequentialist position might make sense:

“Most of the important discoveries are going to be done by people who are at the top of their field. Thus, making sure that the future is going to turn out well means making sure the smartest people in 20 years have a good set of ethics. Thus, we should focus on those people.”

And while I don’t think any explicit form of the above reasoning took root in our decision process, I definitely think it was present in some form.

Now, though, I think that trading off additional technical ability for pre-exposure to EA/​rationalist is generally a good idea if the thing you care about is impact, and the actual benefits work out in its favor. In other words, we should weight attitude more than we currently do in the attitude-aptitude trade-off.

This is a direct result of my experience at ESPR 2016 /​ 2017.

(Note that this may still be too small of a sample for my opinion here to mean very much.)

I’ve also changed my mind on what I want participants to get out of ESPR:

As a result of conversations with others who gave pushback on explicitly pushing for the rationality and effective altruist angle, I think I’ve now pivoted to thinking that the point of ESPR is to get more people thinking altruistically, with effectiveness and rationality as merely nice-to-haves.

What I sort of mean by that is a set of questions that sort of goes like, “Do I expect this person to be doing exciting work that will, on net, be beneficial to other people in the future? Do I expect them to care about helping others? Do I see them interacting with others who are engaging in humanitarian projects?”

(I know the above criterion is still vague, and it merely sort of maps onto my gut impressions /​ checking in with my internal anticipations to see what I expect. It also feels like the best I’ve got for now. )

Anyway, this is a noticeable shift from my original viewpoint of “Let’s get everyone super into effective altruism and rationality!”

While part of this is arguably due to holding more realistic expectations about base rates for memetic adoption (read: the onset of cynicism), there is another part of me that’s genuinely unsure about how stable or correct the EA /​ rationalist frameworks are. It just seems like a good idea to have people who share similar values exploring a different space, even if my values were deeply EA-aligned.

Knowing all this now will probably make some things clearer if /​ when I start to consider applicants for ESPR 2018.

But evaluation is still difficult. Above, the 67% and 17% estimates were only for benefit the participant received. Yet, there’s other ways a participant could be a good pick, even if they themselves didn’t get that much out of ESPR.

Consider the following factors, which would all seem to indicate that, from a consequentialist viewpoint, it was “good” for ESPR to take on a certain student:

  1. Degree to which the student contributed positively to the camp environment (EX: Alice was responsible for making camp better for other students.)
    Relative difficulty to measure: 14

  2. Degree to which the student received concrete takeaways from the camp (EX: Bob came away with a new outlook on life.)
    Relative difficulty to measure: 24

  3. Degree to which a student’s received benefit was counterfactually good. (EX: Carol had a very positive learning experience at ESPR, relative to all her other options for the summer.)
    Relative difficulty to measure: 34

  4. Degree to which the student will go on to have a positive impact on the world (EX: Evan leaves ESPR with grand ideas and creates a startup designed at providing free global WiFi.)
    Relative difficulty to measure: 44

Looking at both the ESPR 2016 and 2017 cohorts, my impression is that, while some clustering in the above factors occurs, it’s also really hard to predict this type of stuff ahead of time, as well as measure this post-camp.

In fact, I’d find it plausible that trying to subtly control admissions for the above factors is largely useless. During admissions, we made certain bets on which students might fit the above factors, EX: which ones we expected to contribute a lot to the overall atmosphere.

Some of those bets panned out, but some of them didn’t.

As a result, while I think it’s true that admissions has a strong overall effect on the camp, we can’t escape base rates. It seems likely that, at the end of the day, we’d also have seen roughly the same 67% and 17% percentages.

A good part of my intuition for this comes from the fact that “probability that a student takes well to crazy-sounding ideas” (which is what at least 30% of what ESPR is, I claim) seems to be distributed about the same, largely independent of who we pick.

This overall seems to be a point against admissions being important.

The outside view says that the communication channels we used to advertise ESPR actually did a lot of the heavy lifting of filtering for capable candidate (which might in itself be a positive of focusing on highly talented students, contra my “attitude over aptitude” stance).

Still, my internal estimate says that, even if the 20th best candidate wouldn’t have differed greatly from the 50th best candidate, surely some type of filtering by the interviewer (i.e. me) was happening was relevant to why they were even in the final pool, right?

And perhaps the right response here is, “Owen, you’re being fooled by noise!”

(“I can’t hear you over the sound of my self-righteousness!”)

Actually, though, I think the right answer is that interviewing is pretty good at avoiding Type 2 errors: Most of the people we rejected after interviewing were (I subjectively claim), probably not that great.

However, the fact that our bets didn’t all pay off seems to indicate that we’re going to get false positives, and this is something we can’t strongly remove.

Improving The Camp Experience:

While it’s debatable, then, what effect admissions selection has on the camp experience (which is why I listed it last), it seems clearer that what actually happened at camp had a large effect on the participants.

There is a story, albeit one that I don’t endorse, where the majority of the value of ESPR comes from simply bringing smart people together. As this model goes, the actual classes themselves aren’t as important as the social aspects dominate, and the social aspects are responsible for most of the good things ESPR offers.

So, within the context of class curriculum, camp events, and evaluating their impact, here are my two thoughts:

  1. Most of the benefit to participants from ESPR can be traced back to just a few (albeit, different) events.

  2. Providing opportunities for ownership of ideas /​ learning is one of the most important pedagogical techniques, and we underutilized it.

The first point is basically a consequence of how humans compare things relatively.

Even if all the events we provided at ESPR are super-duper great, there’s going to be implicit ranking and comparison happening in the students’ heads. While I think this means we could strategically aim for certain events to be “The Important Ones”, I also think we can’t get too good estimates on which events will land well (with a few exceptions).

This ignorance doesn’t mean, though, that we’re allowed to slack off on quality for some activities. Rather, it feels like having every activity be Awesome is a prerequisite for those relatively life-changing opportunities to crop up in the first place.

The analogy here is how you have lots of thoughts everyday, but it’s likely only a few of them are insightful. But you can’t get them in the first place by trying to think just a few thoughts; if you aren’t always thinking, they just won’t come.

The second point is about giving students more opportunities to exert their own Do Thing muscles. Overall, I feel like we still emphasized learning over doing.

I think it’s clearly Good that ESPR made time for students to assert their own abilities and own up to crazy ideas. This manifested in letting students teach afternoon classes, sending them off into the world, and assigning one another Quests to accomplish.

I think two of the most successful activities in this spirit that we ran at ESPR were Lightning Talks and Social Engineering. During Lightning Talks, every student was encouraged to give a talk on a topic (any topic! English tea! Wales! Freud!) for about 2 minutes. Social Engineering was, in a nutshell, when we sent students off into the streets of London to convince companies to do nice things for us.

There seems to be something that’s just Good© about letting students do projects /​ schemes, and I think we didn’t mine this area nearly enough for all the good things that could have popped out.

And for what we did do, the results we got were always interesting and, surprisingly often, impressive. (Guest speakers! Deep dives into humor analysis! Free lunches!)

Camp Internals:

When it came to the actual interactions between the ESPR staff, I think the biggest problem we faced was the unclear designation of roles and a lack of specificity in what the roles were supposed to do.

Like most things in life, no one was being actively stupid in not spotting this beforehand. It just happened that there was a chain of understandable events which led to this being the state of affairs we found ourselves in.

But the end result was still such that, by the time ESPR started, there still wasn’t mutual understanding or agreement on which tasks each role (student, counselor, instructor, ops, etc.) was responsible for. I think this ended up being harmful to some staff, me included, as well as overall camp operations.

First off, a lack of clear duties for each role meant that there was a lot of inter-role variation. Staff who were supposedly doing the same thing in name would end up having quite different tasks in actuality. This is similar to the issue with divergent goals all flocking under the general mission statement banner.

I think this ended up breeding bad sentiment (at least for me) when the benefits associated with each role, which also weren’t well-specified beforehand, ended up poorly calibrated, i.e. incommensurate, with the amount of effort different staff were putting in.

The obvious thing to do, then, is to once again try to be upfront with each volunteer about what they were hoping to give in and get out of the project. And I think we sort of did this.

The problem here was that these expectations changed over the course of the project, as some people did more than we initially brought them on for. People’s actual roles stretched and shortened throughout the extent of the project, and motivation often rose or dropped for individual staff. And as the situation changed, earlier promises couldn’t always be fulfilled, or people asked for more.

Related to this is the fact that different criteria are used to evaluate competence for each of the roles, and this competence doesn’t necessarily translate from domain to domain.

This meant that ESPR was placed in a dynamic where some people who went the extra mile weren’t always compensated the way they wanted to be because they would perform poorly at those roles. For example, an outstanding instructor might have wanted to also try her hand at operations work, but we ended up saying no to the request because she likely wouldn’t have been very useful in ops.

There was a definite trade-off, then, satisfying between people’s preferences and what would be “best” for the camp overall.

Once again, I think being more upfront about all this and spending more time to find compensatory measures would have been very useful in curbing some of these issues.

Even in strictly volunteer-only roles, I claim that people always have tacit expectations for what their give /​ get relationship with the project “should” look like, and our inattention in this area meant additional burdens were borne out unequally by certain ESPR staff.

Lastly, I think there is often a type of pressure for groups to put tasks up for grabs, in an Everyone Does Everything sort of way. There’s something that feels virtuous and democratic and “nice” about this setup, as it sends a signal that roles are irrelevant and everyone is equal /​ pitches in.

I want to push against this intuition.

Putting tasks up for grabs in a democratic way negatively incentivizes first-movers who take the initiative. It’s a tragedy of the commons, where everyone is individually better off if they wait for another person to shoulder yet another task.

But it’s not just this dynamic; I claim it’s worse than that.

Tasks you accumulate snowball, further punishing people who take the initiative. This dynamic pulls people into deeper roles which, if the responsibility /​ reward structure isn’t laid out well, breeds burnout and bad feelings.

Here’s an example that actually happened:

Right before ESPR started, we needed to figure out transportation for the students. As I had some spare time, I ended up doing the preliminary logistics work of cataloging all the info we had into a spreadsheet. But this also meant that for the next action we had to take on transportation, I was also the go-to person because I “had done it last time”.

The cycle here is pretty vicious—with each additional task that I did to further our transportation goals, it also became less likely that I’d get additional help from other people because the required background knowledge increased.

EX: “I could also rope in Alice to help with this task, but then I’d have to tell them about X, Y, and Z. Oh well; I guess it’s more efficient for the group if I just do it again then.”

The obvious answer here is perhaps to fight against the sunk-cost response that appears in the above example because roping in an additional person in the know will actually save time down the road. But, really, what I think we need is just a better way of letting people take ownership of certain tasks. Ambiguity in delegation is a recipe for inaction.

I think prior to ESPR, I might have also tried to champion an organizational structure along the lines of Everyone Does Everything.

I no longer endorse this, and, as a result of my experience, lean towards a much more top-down /​ legible and explicit approach when it comes to distributing tasks in an organization or project.

Personal Conclusion:

So that’s a lot about how ESPR The Project functioned.

For Owen The ESPR Staff (i.e. me, personally), my experience was largely very positive and it was paired with, as I’ve perhaps alluded to, several disappointments.

I felt like we missed some of the cohesiveness and “spirit” that was present in ESPR 2016, but this was partially remedied by the inclusion of new bright spots, like seeing last year’s participants step up as counselors or this year’s participants teaching their own classes.

Overall, I took on significantly more responsibility than I originally signed up for, and diving deep into the thick of things was highly instructive. Most of the insights, though, seem to be in the form of gut-level expectations, and this post-mortem has been an attempt to tease out some of those models and intuitions.

For the rationality community at large, I’m unsure what the takeaways are. Student outreach is always tricky, and there’s much more to be said about things like setting camp culture and how different students interact with the entire rationality memeplex.

As a case study of how several people come together to form an organization and then execute on a major project, holding goals roughly aligned with the community at large, I think we can serve as a useful example.

There’s a definite change in the quality of a model when it comes to knowing something vs experiencing it, but it’s my hope that this can contribute to the ongoing discourse on group rationality as well as pedagogy.