Could models trained to introspect “daydream” or boost features of their choosing, a la “don’t think about aquariums”?
A continuation of #1, could models be given a tool to boost their own features to assist with specific tasks, (for example, features related to reasoning/hard problem solving) and would this lead to meaningful results/be useful?
Perhaps less fanciful than the above, could smaller models have permanent feature boostings to help focus them to narrow tasks?
This paper makes me really curious:
Could models trained to introspect “daydream” or boost features of their choosing, a la “don’t think about aquariums”?
A continuation of #1, could models be given a tool to boost their own features to assist with specific tasks, (for example, features related to reasoning/hard problem solving) and would this lead to meaningful results/be useful?
Perhaps less fanciful than the above, could smaller models have permanent feature boostings to help focus them to narrow tasks?
You might be interested in https://www.lesswrong.com/posts/dvbRv97GpRg5gXKrf/run-time-steering-can-surpass-post-training-reasoning-task