I just want to mention that my recent critique of the definition of communication used here does not imply that this is any more inadequate for alignment than your remarks here suggest; in order to “do what I mean, not what I say,” we actually want to include connotations and implicature rather than only the literal meaning.
That being said, a theory of meaning which addressed the critique might potentially open the path for a definition much better than the one here. In particular, it might help address the question of what ontology the beliefs should even be in (in order to represent human values etc).
I just want to mention that my recent critique of the definition of communication used here does not imply that this is any more inadequate for alignment than your remarks here suggest; in order to “do what I mean, not what I say,” we actually want to include connotations and implicature rather than only the literal meaning.
That being said, a theory of meaning which addressed the critique might potentially open the path for a definition much better than the one here. In particular, it might help address the question of what ontology the beliefs should even be in (in order to represent human values etc).