good question! I think the difference between “is this behaviour real” vs “is this behaviour just a simulation of something” is an important philosophical one; see discussion here
However, both of these seem quite indistinguishable from a functional perspective, so I’m not sure if it matters.
Is it indistinguishable? Is there a way we could test this? I’d assume if Claude is capable of introspection then it’s narratives of how it came to certain replies and responses should allow us to make better and more effective prompts (i.e. allows us to better model Claude). What form might this experiment take?
good question! I think the difference between “is this behaviour real” vs “is this behaviour just a simulation of something” is an important philosophical one; see discussion here
However, both of these seem quite indistinguishable from a functional perspective, so I’m not sure if it matters.
Is it indistinguishable? Is there a way we could test this? I’d assume if Claude is capable of introspection then it’s narratives of how it came to certain replies and responses should allow us to make better and more effective prompts (i.e. allows us to better model Claude). What form might this experiment take?