The neural tangent kernel[1] provides an intuitive story for how neural networks generalize: a gradient update on a datapoint will shift similar (as measured by the hidden activations of the NN) datapoints in a similar way.
The vast majority of LLM capabilities still arise from mimicking human choices in particular circumstances. This gives you a substantial amount of alignment “for free” (since you don’t have to worry that the LLMs will grab excess power when humans don’t), but it also limits you to ~human-level capabilities.
“Gradualism” can mean that fundamentally novel methods only make incremental progress on outcomes, but in most people’s imagination I think it rather means that people will keep the human-mimicking capabilities generator as the source of progress, mainly focusing on scaling it up instead of on deriving capabilities by other means.
- ^
Maybe I should be cautious about invoking this without linking to a comprehensible explanation of what it means, since most resources on it are kind of involved...
I’m not sure I understand your question. By AI companies “making copying hard enough”, I assume you mean making AIs not leak secrets from their prompt/training (or other conditioning). It seems true to me that this will raise the relevance of AI in society. Whether this increase is hard-alignment-problem-complete seems to depend on other background assumptions not discussed here.