Roman Leventov comments on Reframing inner alignment

Roman Leventov 20 Dec 2022 19:59 UTC
1 point
0
“Purely epistemic model” is not a thing, everything is an agent that is self-evidencing at least to some degree: https://www.lesswrong.com/posts/oSPhmfnMGgGrpe7ib/properties-of-current-ais-and-some-predictions-of-the. I agree, however, that RLHF actively strengthens goal-directedness (the synonym of self-evidencing) which may otherwise remain almost rudimentary in LLMs.