Well, architecture specific is something: maybe some different architectures other than LLMs/ANNs are more amenable to alignment, and that’s that. Or it could be a more general result about e.g. what can be achieved with SGD. Though I expect there may also be a general proof altogether, akin to the undecidability of the halting problem.
Well, architecture specific is something: maybe some different architectures other than LLMs/ANNs are more amenable to alignment, and that’s that. Or it could be a more general result about e.g. what can be achieved with SGD. Though I expect there may also be a general proof altogether, akin to the undecidability of the halting problem.