Thus, my current main advice for people hoping to build AI tools for boosting alignment research: go work on the object-level research you’re trying to boost for a while. Once you have a decent amount of domain expertise, once you have made any progress at all (and therefore have any first-hand idea of what kinds of things even produce progress), then you can maybe shift to the meta-level[2].
I mostly agree with this, which is why I personally took 6 months away from this approach and tried to develop my domain expertise during my time at MATS. I don’t think this is enough time, unfortunately (so I might spend more time on object-level work after my current 3-month grant). However, I plan to continue to do object-level research and build tools that are informed by my own bottlenecks and others. There are already many things I think I could build that would accelerate my work and possibly accelerate the work of others.
I see my current approach as creating a feedback loop where both things that take up my time inform each other (so I at least have N>0 users). I expect to build the things that seem the most useful for now, then re-evaluate based on feedback (is this accelerating alignment greatly or not at all?) and then decide whether I should focus all my time on object-level research again. Though I expect at this point that I could direct some software engineers to build out the things I have in mind at the same time.
What I’ve found that might be valuable for thinking about these tools is to backcast how I might see myself or others coming up with a solution to alignment and then focusing on tools that would primarily accelerate research into actually being crucially important for solving the problem rather than optimizing for something else. I think dedicating time to object-level work has been helpful for this.
At a meta level, cognitive tool-building is very much the sort of work where you should pick one or a handful of people to build the prototype for, focus on making those specific people much more productive, and get a fast feedback loop going that way. That’s how wrong initial guesses turn into better later guesses.
Agreed.
If the tracked-information is represented somewhere outside my head, then (a) it frees up a lot of working memory and lets me track more things, and (b) it makes it much easier to communicate what I’m thinking to others.
Yes! That is precisely what I have in mind when thinking about building tools. What can I build that sufficiently frees up working memory / cognitive load so that the researcher can use that extra space for thinking more deeply about other things?
A side problem which I do not think is the main problem for “AI tools for AI alignment” approaches: there is a limit to how much of a research productivity multiplier we can get from google-search-style tools. Google search is helpful, but it’s not a 100x on research productivity (as evidenced by the lack of a 100x jump in research productivity shortly after Google came along). Fundamentally, a key part of what makes such tools “tools” is that most of the key load-bearing cognition still “routes through” a human user; thus the limit on how much of a productivity boost they could yield. But I do find a 2x boost plausible, or maybe 5-10x on the optimistic end. The more-optimistic possibilities in that space would be a pretty big deal.
I aim for a minimum 10x speed up when thinking about this general approach (or at least leads to some individual, specific breakthroughs in alignment). I’m still grappling with when to drop this direction if it is not very fruitful. I’m trying to be conscious of what I think weak AI won’t be able to solve. Either way, I hope to bring on software engineers / builders who can help make progress on some of my ideas (some have already).
Thanks for writing this post, John! I’ll comment since this is one of the directions I am exploring (released an alignment text dataset, published a survey for feedback on tools for alignment research, and have been ruminating on these ideas for a while).
I mostly agree with this, which is why I personally took 6 months away from this approach and tried to develop my domain expertise during my time at MATS. I don’t think this is enough time, unfortunately (so I might spend more time on object-level work after my current 3-month grant). However, I plan to continue to do object-level research and build tools that are informed by my own bottlenecks and others. There are already many things I think I could build that would accelerate my work and possibly accelerate the work of others.
I see my current approach as creating a feedback loop where both things that take up my time inform each other (so I at least have N>0 users). I expect to build the things that seem the most useful for now, then re-evaluate based on feedback (is this accelerating alignment greatly or not at all?) and then decide whether I should focus all my time on object-level research again. Though I expect at this point that I could direct some software engineers to build out the things I have in mind at the same time.
What I’ve found that might be valuable for thinking about these tools is to backcast how I might see myself or others coming up with a solution to alignment and then focusing on tools that would primarily accelerate research into actually being crucially important for solving the problem rather than optimizing for something else. I think dedicating time to object-level work has been helpful for this.
Agreed.
Yes! That is precisely what I have in mind when thinking about building tools. What can I build that sufficiently frees up working memory / cognitive load so that the researcher can use that extra space for thinking more deeply about other things?
I aim for a minimum 10x speed up when thinking about this general approach (or at least leads to some individual, specific breakthroughs in alignment). I’m still grappling with when to drop this direction if it is not very fruitful. I’m trying to be conscious of what I think weak AI won’t be able to solve. Either way, I hope to bring on software engineers / builders who can help make progress on some of my ideas (some have already).