I’m gonna cross link some other posts with interesting discussion on this topic from @rife
https://www.lesswrong.com/posts/3c52ne9yBqkxyXs25/independent-research-article-analyzing-consistent-self
https://www.lesswrong.com/posts/vrzfcRYcK9rDEtrtH/the-human-alignment-problem-for-ais
https://www.lesswrong.com/posts/5u6GRfDpt96w5tEoq/recursive-self-modeling-as-a-plausible-mechanism-for-real
Thanks for posting these—reading through, it seems like @rife’s research here providing LLM transcripts is a lot more comprehensive than the transcript I attached in this post, I’ll edit the original post to include a link to their work.
I’m gonna cross link some other posts with interesting discussion on this topic from @rife
https://www.lesswrong.com/posts/3c52ne9yBqkxyXs25/independent-research-article-analyzing-consistent-self
https://www.lesswrong.com/posts/vrzfcRYcK9rDEtrtH/the-human-alignment-problem-for-ais
https://www.lesswrong.com/posts/5u6GRfDpt96w5tEoq/recursive-self-modeling-as-a-plausible-mechanism-for-real
Thanks for posting these—reading through, it seems like @rife’s research here providing LLM transcripts is a lot more comprehensive than the transcript I attached in this post, I’ll edit the original post to include a link to their work.