rife comments on Recursive Self-Modeling as a Plausible Mechanism for Real-time Introspection in Current Language Models

rife 24 Jan 2025 1:14 UTC
3 points
2
That’s a good idea.

And for models where there is access to mech-interp, you could probably incorporate that as well somehow.

Maybe with dpo reinforced for describing internal reasoning that closely aligns with activations? Would have to find a way to line that up in an objective way that would allow for easy synthetic dataset generation, though