My thoughts on direct work (and joining LessWrong)

Epistemic status: mostly a description of my personal timeline, and some of my models (without detailed justification). If you just want the second, skip the Timeline section.

My name is Robert. By trade, I’m a software engineer. For my entire life, I’ve lived in Los Angeles.

Timeline

I’ve been reading LessWrong (and some of the associated blog-o-sphere) since I was in college, having found it through HPMOR in ~2011. When I read Eliezer’s writing on AI risk, it more or less instantly clicked into place for me as obviously true (though my understanding at the time was even more lacking than it is now, in terms of having a good gears-level model). This was before the DL revolution had penetrated my bubble. My timelines, as much as I had any, were not that short—maybe 50-100 years, though I don’t think I wrote it down, and won’t swear my recollection is accurate. Nonetheless, I was motivated enough to seek an internship with MIRI after I graduated, near the end of 2013 (though I think the larger part of my motivation was to have something on my resume). That should not update you upwards on my knowledge or competence; I accomplished very little besides some reworking of the program to mail paperback copies of HPMOR to e.g. math olympiad winners and hedge fund employees.

Between September and November, I read the transcripts of discussions Eliezer had with Richard, Paul, etc. I updated in the direction of shorter timelines. This update did not propagate into my actions.

In February I posted a question about whether there were any organizations tackling AI alignment, which were also hiring remotely[1]. Up until this point, I had been working as a software engineer for a tech company (not in the AI/​ML space), donating a small percentage of my income to MIRI, lacking better ideas for mitigating x-risk. I did receive a few answers, and some of the listed organizations were new to me. For reasons I’ll discuss shortly, I did not apply to any of them.

On April 1st, Eliezer posted Death with Dignity. My system 1 caught up to my system 2 on what shorter timelines meant. That weekend, I talked to my family about leaving LA.

On April 3rd I pinged Ruby to see if he’d be open to a chat about Lightcone’s interview process.

Over the next month I had a couple more calls with Ruby, found an opportunity to drop by the Lightcone offices to talk to habryka (since I was traveling to the Bay anyways), and went through a couple tech screens.

In May I flew back to up Berkeley to go through the final stage of the interview—an on-site work trial.

The trial must’ve gone ok; I started at Lightcone on July 5th. My primary focus will be LessWrong and the associated infrastructure[2].

Why LessWrong?

A problem I’ve been thinking about lately is: imagine you want to reduce AI risk, but you aren’t a researcher[3] and are limited in how well you can evaluate the usefulness[4] of the research direction and output of AI alignment organizations. With some exceptions, those organizations describe neither what they consider to be the core problems of AI alignment, nor their theory of change. How, then, do you figure out where your efforts are best directed?

You can defer to another’s judgement, if you know someone with an opinion on the question and have evidence of their ability to reason about difficult questions in domains you’re more familiar with. Unfortunately, this heuristic encourages information cascades, but purely on a personal level it might be better than throwing darts[5].

Absent that, or another good way to discriminate between options, the temptation is to kick the problem up a level and do “meta” work. This has some problems:

  • The indirection this adds to the theory of change introduces additional degrees of freedom, which can make it easier to believe you’re doing something useful even if you aren’t

  • If everyone in this position follows this heuristic, it deprives helpful-in-expectation projects of manpower[6]

Nonetheless, there are some problems that I think LessWrong can help address.

Concretizing problems in AI alignment

AI alignment has sometimes been described as a pre-paradigmatic field. LessWrong (and the Alignment Forum) are places where concrete problems and research agendas are discussed. One specific mechanism by which LessWrong can accelerate progress on those subjects is to “improve” LessWrong, such that existing alignment researchers get more value out of writing up their work on it, and the ensuing discussions. This would have additional benefits, beyond the object-level “research progress == good?”:

  • New alignment researchers would have higher-quality material to engage with

  • Additional progress on formalizing concrete problems, and agendas for attacking those problems, also makes them easier to distill in a way that becomes actionable for others who aren’t researchers, which hopefully improves the quality (and quantity) of non-researcher effort needed for those problems

Improving epistemics

Trying to solve a highly specific problem, which you didn’t choose yourself, is difficult[7]. Most things are not solutions to any given specific problem.

One way we see this expressed is in Effective Altruism. Effective interventions are multiple orders of magnitude more effective at solving specific problems (i.e. saving lives) than ineffective interventions, and you can’t even use orders of magnitude to measure the difference between an effective intervention and one that has no effect (or worse, a negative one). It turns out that the epistemic skills of “notice that there is a problem” and “try to measure how effective your solutions are” suffices to get you a few OoM improvements in areas like global health and development[8].

Interventions in global health and development benefit from having a feedback loop. Under some models of the AI alignment problem, we do not currently have a feedback loop[9]. Much ink has been spilled on the difficulty of trying to solve a problem ahead of time and without any feedback loop; I won’t rehash those arguments at length.

I believe that the presence (or absence) of various epistemic skills can make the difference between a researcher spending their time solving real problems (or searching for real problems to solve), and spending time working on problems that do not turn out to be relevant. Even reducing the amount of time it takes to reject a dead-end strategy is a win.

I also believe that LessWrong is a place where those epistemic skills can be acquired and refined. Historically, one might have read the Sequences, participated in discussion by posting and commenting, or attended a local meetup. Improving LessWrong, such that it better encourages the acquisition and refinement of those skills, seems like a plausibly useful lever. (For example, we recently released the Sequences Highlights, which aim to provide a more streamlined onboarding for those reluctant to tackle the admittedly formidable length of the original Sequences. Nonetheless, I do think there is significant value in reading them in full.)

Still, (probably) do object-level work

Lightcone continues to hire. I think there are compelling arguments for working on both the Campus project and on LessWrong.

However, if you are thinking about doing “direct work”, I think your default should be to do object-level work on AI alignment. There are too few people, trying too few things. We have not yet even solved problems like “organizations in the space have public write-ups of their theories of change, which are sufficiently legible to non-researchers such that they can make a reasonably informed decision about where to direct their efforts”[10].

AMA

I am happy to answer questions[11]. Please also feel free to DM me, if you prefer.

  1. ^

    While writing that post, I ran into a bug and submitted a bug report, which led to a brief conversation with Ruby about my thoughts on remote work, and his perspective on Lightcone’s hiring philosophy.

  2. ^

    LessWrong, the Alignment Forum, the EA Forum, and the Progress Forum all share the same codebase, though there are some differences in features.

  3. ^

    But if you aren’t, you should at least try it before writing it off—as an example, John Wentworth transitioned from being a software engineer/​data scientist.

  4. ^

    In terms of attacking what you think are the core problems, insofar as you have your own models there.

  5. ^

    Keep in mind that some efforts may be net negative. Taking into account the difficulties of forecasting future events, it’s not necessarily clear which direction the sign goes, particularly if you buy in to e.g. differential development being a useful perspective on timelines.

  6. ^

    Though it also deprives harmful-in-expectation projects of manpower, so whether this is good or bad depends on how you balance various considerations that might influence timelines. Two such considerations:

    • What percentage of projects that claim to be working on AI alignment are actually doing something actively harmful, i.e. speeding up timelines

    • How good people following this heuristic would otherwise be at ruling out harmful-in-expectation projects

  7. ^

    I can’t find the reference, but I’m fairly sure I saw this idea neatly expressed in a LessWrong post some time in the last year. If someone knows what I’m talking about, please send it along!

  8. ^

    This is a bit of an understatement; there are other important epistemic skills involved, along with a lot of actual hard work. But I think these are the core drivers, in this domain.

  9. ^

    Other models disagree.

  10. ^

    This post describes some of my theory of change w.r.t. LessWrong; other members of the LessWrong team may have different views, whether opposed, orthogonal, or complementary. We do have significant internal alignment; I’m just not speaking for the team.

  11. ^

    I do not promise to answer all questions. Please exercise some common sense when deciding what question(s) to ask. Centrally, I expect questions about things like “my model of [x]” or “my decision process for [y]”, for subjects plausibly related to this post. If you have a question or need help with something related to LessWrong (your account, some feature/​bug, etc), please message us on Intercom!