oh I see … yeah, the approach sounds practical enough to be worth doing empirical experiments like these—IMHO already happening along similar lines, more suitable for B being an LLM after pre-training and before RL, not B being already deceptive from some non-LLM breakthrough or after RL
oh I see … yeah, the approach sounds practical enough to be worth doing empirical experiments like these—IMHO already happening along similar lines, more suitable for B being an LLM after pre-training and before RL, not B being already deceptive from some non-LLM breakthrough or after RL