To the degree worries of this general shape are legitimate (we think they very much are), seems like it would be wise for the alignment community to more seriously pursue and evaluate tons of neglected approaches that might solve the fundamental underlying alignment problem, rather than investing the vast majority of resources in things like evals and demos of misalignment failure modes in current LLMs, which definitely are nice to have, but almost certainly won’t themselves directly yield scalable solutions to robustly aligning AGI/ASI.
To the degree worries of this general shape are legitimate (we think they very much are), seems like it would be wise for the alignment community to more seriously pursue and evaluate tons of neglected approaches that might solve the fundamental underlying alignment problem, rather than investing the vast majority of resources in things like evals and demos of misalignment failure modes in current LLMs, which definitely are nice to have, but almost certainly won’t themselves directly yield scalable solutions to robustly aligning AGI/ASI.