Alexandra Bos comments on [missing post]

Alexandra Bos 5 Jun 2023 11:10 UTC
3 points
0
I’d be curious to hear from the people who pressed the disagreement button on Evan’s remark: what part of this do you disagree with or not recognize?
- Thomas Kwa 5 Jun 2023 11:22 UTC
  4 points
  3
  Parent
  I didn’t hit disagree, but IMO there are way more than “few research directions” that can be accessed without cutting-edge models, especially with all the new open-source LLMs.
  - All conceptual work: agent foundations, mechanistic anomaly detection, etc.
  - Mechanistic interpretability, which when interpreted broadly could be 40% of empirical alignment work
  - Model control like the nascent area of activation additions
  I’ve heard that evals, debate, prosaic work into honesty, and various other schemes need cutting-edge models, but in the past few weeks transitioning from mostly conceptual work into empirical work, I have far more questions than I have time to answer using GPT-2 or AlphaStar sized models. If alignment is hard we’ll want to understand the small models first.
  - Evan R. Murphy 5 Jun 2023 20:53 UTC
    2 points
    −3
    Parent
    I wasn’t saying that there were only a few research directions that don’t require frontier models period, just that there are only a few that don’t require frontier models and still seem relevant/promising, at least assuming short timelines to AGI.
    I am skeptical that agent foundations is still very promising or relevant in the present situation. I wouldn’t want to shut down someone’s research in this area if they were particularly passionate about it or considered themselves on the cusp of an important breakthrough. But I’m not sure it’s wise to be spending scarce incubator resources to funnel new researchers into agent foundations research at this stage.
    Good points about mechanistic anomaly detection and activation additions though! (And mechanistic interpretability, but I mentioned that in my previous comment.) I need to read up more on activation additions.
  - [ ]
    [deleted]