Gurkenglas comments on Gurkenglas’s Shortform

Gurkenglas 19 Nov 2020 23:27 UTC
2 points
0
All the knowledge a language model character demonstrates is contained in the model. I expect that there is also some manner of intelligence, perhaps pattern-matching ability, such that the model cannot write a smarter character than itself. The better we engineer our prompts, the smaller the model overhang. The larger the overhang, the more opportunity for inner alignment failure.