I expect cached thoughts to often look from the outside similar to “rounding errors”: someone didn’t listen to some actual argument, because they patter-matched it to something else they already have an opinion on/answer to.
The proposed mitigations shouldn’t really work. E.g., with explicitly tagging differences, if you “round off” an idea you hear to something you already know, you won’t feel it’s new and won’t do the proposed system-2 motions. Maybe a thing to do instead is checking whether what you’re told is indeed the idea you know when encountering already known ideas.
Also, I’m not convinced by the examples.
On LessWrong, almost any idea from representational alignment or convergent abstractions risks getting rounded off to Natural Abstractions
Instrumental convergence vs. Power-seeking
Embedded agency vs. Embodied cognition vs. Situated agents
Various stories about recursive feedback loops vs. Intelligence explosion
I’ve only noticed something akin to the last one. It’s not very clear in what sense people would round off instrumental convergence to power-seeking (and are there examples severe power-seeking was rounded off to instrumental convergence in an invalid way?), or “embodied cognition” to embedded agency.
Why do you think “rounding errors” occur?
I expect cached thoughts to often look from the outside similar to “rounding errors”: someone didn’t listen to some actual argument, because they patter-matched it to something else they already have an opinion on/answer to.
The proposed mitigations shouldn’t really work. E.g., with explicitly tagging differences, if you “round off” an idea you hear to something you already know, you won’t feel it’s new and won’t do the proposed system-2 motions. Maybe a thing to do instead is checking whether what you’re told is indeed the idea you know when encountering already known ideas.
Also, I’m not convinced by the examples.
I’ve only noticed something akin to the last one. It’s not very clear in what sense people would round off instrumental convergence to power-seeking (and are there examples severe power-seeking was rounded off to instrumental convergence in an invalid way?), or “embodied cognition” to embedded agency.
Would appreciate links if you have any!