Question: why is a set of ideas about alignment being adjacent to capabilities only a one-way relationship? More directly, why can’t this mindset be used to pull alignment gains out of capability research?
I think it is 2-way, which is why many (almost all?) Alignment researchers have spent a significant amount of time looking at ML models and capabilities, and have guesses about where those are going.
Not sure exactly what the question is, but research styles from ML capabilities have definitely been useful for alignment, e.g. the idea of publishing benchmarks.
Question: why is a set of ideas about alignment being adjacent to capabilities only a one-way relationship? More directly, why can’t this mindset be used to pull alignment gains out of capability research?
I think it is 2-way, which is why many (almost all?) Alignment researchers have spent a significant amount of time looking at ML models and capabilities, and have guesses about where those are going.
Not sure exactly what the question is, but research styles from ML capabilities have definitely been useful for alignment, e.g. the idea of publishing benchmarks.