No, I am in fact quite worried about the situation
Fair, sorry. I appear to have been arguing with my model of someone holding your general position, rather than with my model of you.
I think these AGIs won’t be within-forward-pass deceptively aligned, and instead their agency will eg come from scaffolding-like structures
Would you outline your full argument for this and the reasoning/evidence backing that argument?
To restate: My claim is that, no matter much empirical evidence we have regarding LLMs’ internals, until we have either an AGI we’ve empirically studied or a formal theory of AGI cognition, we cannot say whether shard-theory-like or classical-agent-like views on it will turn out to have been correct. Arguably, both side of the debate have about the same amount of evidence: generalizations from maybe-valid maybe-not reference classes (humans vs. LLMs) and ambitious but non-rigorous mechanical theories of cognition (the shard theory vs. coherence theorems and their ilk stitched into something like my model).
Fair, sorry. I appear to have been arguing with my model of someone holding your general position, rather than with my model of you.
Would you outline your full argument for this and the reasoning/evidence backing that argument?
To restate: My claim is that, no matter much empirical evidence we have regarding LLMs’ internals, until we have either an AGI we’ve empirically studied or a formal theory of AGI cognition, we cannot say whether shard-theory-like or classical-agent-like views on it will turn out to have been correct. Arguably, both side of the debate have about the same amount of evidence: generalizations from maybe-valid maybe-not reference classes (humans vs. LLMs) and ambitious but non-rigorous mechanical theories of cognition (the shard theory vs. coherence theorems and their ilk stitched into something like my model).
Would you disagree? If yes, how so?