Models are simulations; if it’s a proof, it’s not just a model. A proof is mathematical truth made word; it is, upon inspection and after sufficient verification, self-evident and as sure as any of we assume any of the self-evident axioms it rests on to be. The question is more if it can ever be truly proved at all, or if it doesn’t turn out to be an undecidable problem.
I suppose that is my real concern then. Given we know intelligences can be aligned to human values by virtue of our own existence, I can’t imagine such a proof exists unless it is very architecture specific. In which case, it only tells us not to build atom bombs, while future hydrogen bombs are still on the table.
Well, architecture specific is something: maybe some different architectures other than LLMs/ANNs are more amenable to alignment, and that’s that. Or it could be a more general result about e.g. what can be achieved with SGD. Though I expect there may also be a general proof altogether, akin to the undecidability of the halting problem.
Models are simulations; if it’s a proof, it’s not just a model. A proof is mathematical truth made word; it is, upon inspection and after sufficient verification, self-evident and as sure as any of we assume any of the self-evident axioms it rests on to be. The question is more if it can ever be truly proved at all, or if it doesn’t turn out to be an undecidable problem.
I suppose that is my real concern then. Given we know intelligences can be aligned to human values by virtue of our own existence, I can’t imagine such a proof exists unless it is very architecture specific. In which case, it only tells us not to build atom bombs, while future hydrogen bombs are still on the table.
Well, architecture specific is something: maybe some different architectures other than LLMs/ANNs are more amenable to alignment, and that’s that. Or it could be a more general result about e.g. what can be achieved with SGD. Though I expect there may also be a general proof altogether, akin to the undecidability of the halting problem.