All the knowledge a language model character demonstrates is contained in the model. I expect that there is also some manner of intelligence, perhaps pattern-matching ability, such that the model cannot write a smarter character than itself. The better we engineer our prompts, the smaller the model overhang. The larger the overhang, the more opportunity for inner alignment failure.
All the knowledge a language model character demonstrates is contained in the model. I expect that there is also some manner of intelligence, perhaps pattern-matching ability, such that the model cannot write a smarter character than itself. The better we engineer our prompts, the smaller the model overhang. The larger the overhang, the more opportunity for inner alignment failure.