Against Deep Ideas

When discussing impactful research directions, it’s tempting to get excited about ideas that seem deep and profoundly insightful. This seems especially true in areas that are theoretical and relatively new—such as AI Alignment Theory. Fascination with the concept of a research direction can leak into evaluations of the expected impact, most often through overestimating the likelihood of extremely impactful outcomes. As a result, we should a priori be more skeptical of research projects that we encounter that sound insightful and deep than of those that sound boring and incremental.

This phenomenon can arise naturally from how ideas are generated and spread. If there are two research projects that are roughly equivalent, but one seems deep while the other seems boring, the deep one will garner more attention and interest. The spread and discovery of research ideas thus has a bias towards profound ideas, as profundity is more memetically fit than its absence. I believe that this bias is fairly strong in the AI alignment community, full as it is with researchers who love[1] interesting intellectual challenges and ideas.

Some researchers might think that profound ideas are likely necessary to solve AI Alignment. However, I’ll note that even in such a scenario we should expect profound ideas to be given inordinate attention—as they will by default be selected over boring ideas that are as promising as the average profound approach to the problem. Unless exclusively profound ideas are promising, we should expect bias towards profound ideas to creep in.

Even in a world where profound ideas are absolutely required for AI Alignment research, we should still expect that any given profound idea is very unlikely to succeed. Profound ideas very rarely yield significant results and the importance of solving a given problem should not affect our expectation that any given idea will be successful. In such a world I think exploration is much more important than exploitation—as the chances of success in any one direction are low.

I’m particularly worried about profound research directions like Natural Abstractions or Heuristic Arguments being treated as more promising than they are and consuming a large amount of attention and resources. Both seem to have absorbed quite a lot of thought without yielding legible successes as of yet. Additionally, neither seems to me to be directed by feedback loops that rely on external validation of progress. I think researchers looking to start projects in theoretical alignment should keep these issues in mind, and not necessarily expect this status quo to change in the near future. It may be more promising to consider other directions.

I don’t think the way to deal with this is to completely stop working on profound ideas in fields like AI Alignment where we are often motivated by the expected impact of research. Instead, I think it’s important to notice when a research direction seems deep and profound, acknowledge this, and have a healthy skepticism that expected impact is actually motivating excitement and attention about the idea—from both yourself and others.

  1. ^

    It’s perfectly valid to research things because you enjoy them. I do still think that it’s useful to be able to notice when this is happening.