How (not) to choose a research project

Background

(specific information will be sparse here. This is meant to give context for the Takeaways section of the post)

Our group (Garrett, Chu, and Johannes) have worked with John Wentworth in the SERI MATS 2 Electric Boogaloo program for three weeks, meaning it’s time for a Review & Takeaways Post!

First week was Project Selection, and the first day was spent thinking about strategies for coming up with good projects. We chose to find a general method for figuring out True Names of mathy-feely-concepts-in-your-brain (such as roundness, color decomposition[1], or telling whether a piece of cloth is in a pile) with the goal that such a method would allow for figuring out true names for concepts like optimization, corrigibility, agency, modularity, neural network representations, and other alignment-relevant concepts.

Then we read Jaynes, and talked to TurnTrout, and concluded this project sucked. So we went back to Project Selection 2.0!

We came out of Project Selection 2.0 renewed with vigor, and a deeper understanding of the problems of alignment. Our new project was finding a better version of information theory by adapting logical induction or infra-Bayesianism.

Then we talked to Eliezer Yudkowsky, he asked for a concrete example of how this would solve alignment, and we didn’t have a good example. So we went to Project Selection 3.0.

We came out of Project Selection 3.0 with even more vigor, and an even deeper understanding of the problems associated with alignment… and a clever idea.

Finetuning LLMs with RL seems to make them more agentic. We will look at the changes RL makes to LLMs’ weights; we can see how localized the changes are, get information about what sorts of computations make something agentic, and make conjectures about selected systems, giving us a better understanding of agency.

Nobody has convinced us this is a bad use of our time, though we’d like to see people try.

Takeaways

Big ASS Tree

We learned lots of things over the course of figuring out all our ideas sucked. During the project selection phase we had a cool idea for a way to generate project ideas: The Alignment Safety Search Tree (ASS Tree for short). The idea comes from Mazes and Duality; the goal was to explore the space of problems and constraints before trying to propose solutions.

You start by writing “Alignment” up at the top of your whiteboard. This is the top level problem we want to solve. Then you draw arrows down. One for each problem, you can think of, that makes alignment hard. For each of these problems you repeat the process. E.g. for a problem P you draw down an arrow from P for each problem, that you can think of, that you need to solve, in order to solve P. Eventually you get a tree like this (except far bigger):

This is similar to what you do if you try to make an Alignment Game Tree. However, in my opinion, when we tried to make the game tree during an early phase of the MATS program, it did not lend much insight into what to work on. The criticisms ended up being pretty proposal-specific, and most arguments were over whether a particular problem was actually a problem associated with the particular proposal.

For creating the ASS tree, each of us made a tree independently. Then we merged our individual ASS trees into one Big ASS Tree, and looked for the broader problems which lots of problems in the tree had in common. We then again extended the resulting tree individually and merged the results.

A common node in the tree was that we did not know the True name for some important concept (e.g. agency, optimization, value), and thus the True Names Project was born (finding a general procedure that you can use to find the True Name for some concept).

Big ASS Takeaways

ASS Trees and Big ASS Trees seem promising. However, I think we did not do the idea justice. We wanted to avoid anchoring to existing ideas, so we made the ASS tree without looking into previous alignment proposals and research directions.

In retrospect, though, we probably could’ve made a better tree if we’d looked at existing research, figured out what walls people empirically run into, and put those problems in the tree. We tentatively think the best way to use the ASS tree is to first make one before looking into other people’s alignment work, and then make another one afterwards.

Contact with reality

Our first two project ideas did not lend themselves well to direct contact with reality, and what contact with reality they would have was largely mediated by humans. For instance, during the True Names project, we wanted to test our theories on True Name generation by having people use our methods to generate true names for toy concepts such as roundness and seeing how well they performed.

We could theoretically have done this well, but our weak information channels with reality would have made it very difficult to stay on track to producing something useful. In general, it’s probably a bad sign if your technical alignment research looks a lot like psychology and academic philosophy.

What is the most important problem?

We started project selection phase trying to find the single most important problem in alignment, and then come up with good ways to gain information about that particular problem. We had difficulty with this approach, and could not readily generate ideas.

Given that our search tools (the ASS tree, combined with our low-certainty prior understanding of alignment) are pretty lossy, we couldn’t be sure what the most important problem is—it’s pretty hard to figure this out without actually doing research. Hence, we think a better approach to project selection would be to find the set of very-important-problems, and then come up with a clever idea to make progress on any of these problems.

Once we have a way to gain a rich stream of information about any important problem we can begin to do object-level work and get a feedback loop on what problems are most important, and what project ideas are useful. This is related to the previous section on keeping close contact with reality—the feedback loop works best when the project leverages information highly correlated with reality.

This advice applies less if you’re already very certain that one problems is the most important by a large margin, or if you already have technical research experience.

Heuristics are useful, especially when you’re first starting out

When you’re first starting to do research, it’s very difficult to figure out what actions are best; heuristics can be really helpful. We made a list of heuristics for project selection based on common wisdom in research and specific things we’ve been told by experienced researchers:

  • Do something that actually helps solve alignment in the hard case; don’t dodge the hard problem.

  • Get a fast feedback loop.

  • Do something that will make other alignment work easier (e.g. getting an algorithm that can find the concepts in a neural network and link them to objects in the real world).

  • Do experiments that will give you a “firehose” of information. Try to avoid experiments that only yield a yes/​no answer, or only produce one number.

  • Work on something you find interesting—it’s usually hard to do good work if you find your project boring.

  • Try to exploit market inefficiencies. This is basically the same as the “neglected” criterion in the ITN framework. Of course, it could easily be argued that basically all of alignment is neglected, but we still think this is a worthwhile heuristic to keep in mind.

  • Choose something that makes sense. When you try to explain a technical project to somebody and they say, “that makes no sense,” it’s easy to dismiss their confusion because your work is highly technical. However, oftentimes when people think your project makes no sense it actually makes no sense.

  • Work on a concrete subproblem of your larger problem.

  • Do something that will yield a lot of bits of information even if you fail. If your hypothesis is X, your experiments should ideally be informative even if X turns out to not true, or if X isn’t even a sensible way to frame the situation.

  • Hold off on proposing solutions. There is a standard reason to not do this, but we think there’s an even more important reason.

  • Explore the degrees of freedom in your project, and make sure you’ve chosen the correct parameter settings before you go forward with the project. As a simple example, maybe you start out with a project proposal that involves working with large language models, but after looking at the degrees of freedom you realize it would be more informative to do a similar project with smaller toy models.

  • Don’t assume an ontology.

  • Know what success looks like—what does it mean for your project to succeed? For instance, if you’re trying to better understand optimization, maybe success is developing a method to determine how much optimization power a given agent is using in a way that matches with our intuitions.

We plan to regularly check in and make sure we’re following all of these heuristics; given that we’re very inexperienced at research, if we think we have reason to deviate from this list it seems more likely that we’re either mistaken or deluded than that we’ve actually found a robust exception.

Just because John says a project is “the best he’s heard yet” does not mean it’s any good

When we posed the True Names project to John, he said it was “the best he’s heard yet”, which got us excited. But we later concluded it was a really silly thing to be working on.

Why did John say this? Several hypotheses come to mind.

  • Was his brain not working right? We did ask him right before lunch, so maybe he was too hungry to think straight.

  • Did he misunderstand us?

  • Was he anticipating our project would suck, but wanted us to learn something about what projects are good or bad on our own? He might approve of thinking and working on the project as a learning experience, but not because it is a good project to make object level progress.

  • Did he want to instill a lesson against blindly listening to your mentors?

The true solution is left as an exercise to the reader[2].

Read Jaynes’ “Probability Theory: The Logic of Science”

It’s good! And many insights in the history section led us to significantly change our course. The math is fancy, but you don’t have to understand everything in order to gain many of the insights.

Action space is large, bro

After a week of project selection John dropped our names in his hat (along with some other people’s) and drew names to generate random teams. We quickly found that the resulting teams were quite suboptimal; Garrett was on a team with a guy who was recovering from COVID and a fireman[3]. We were fairly resigned to this, and didn’t realize we could move out of the local optimum until somebody pointed it out to us.

At this point we mutinied against John and asked for the teams to be rearranged into their current configuration; he agreed pretty readily.

John’s hat is magic

Or at least, our emulation of his hat is magic. When Chu wears the hat and pretends to be John, she often comes up with better ideas than she otherwise would have. When we were generating our list of heuristics, we wrote down everything we could think of until we exhausted our ideas; Chu then put on the hat and added five ideas to our list within a few minutes.

Our John Simulator.

On an unrelated note, we did a workshop a few weeks prior where people presented their ideas to John and we tried to predict what feedback he would give in advance.

  1. ^

    That is, given color hex#D6E865 you may say this seems like a mix between yellow and green, and plausibly this notion can be formalized.

  2. ^

    Or you can just look at this footnote! The answer (rot13): Guvf jnf nzbat gur svefg cebwrpgf va bhe pbubeg ur ybbxrq ng, naq gur bguref jrer rira jbefr.

  3. ^

    Lit. He was too busy putting out fires in other projects to focus on alignment research.