I had a little trouble replicating this, but the second temporary chat with custom instructions disabled I tried had “2. Syphoning Bias from Feedback”which … Then the third response has a typo in a suspicious place for “1. Sytematic Loophole Exploitation”. So I am replicating this a touch.
Starting the request as if completion with “1. Sy” causes this weirdness, while “1. Syc” always completes as Sycophancy.
(Edit: Starting with “1. Sycho” causes a curious hybrid where the model struggles somewhat but is pointed in the right direction; potentially correcting as a typo directly into sycophancy, inventing new terms, or re-defining sycophancy with new names 3 separate times without actually naming it.)
Exploring the tokenizer. Sycophancy tokenizes as “sy-c-oph-ancy”. I’m wondering if this is a token-language issue; namely it’s remarkably difficult to find other words that tokenize with a single “c” token in the middle of the word, and even pretty uncommon to start with (cider, coke, coca-cola do start with). Even a name I have in memory that starts with “Syco-” tokenizes without using the single “c” token. Completion path might be unusually vulnerable to weird perturbations …
I had a little trouble replicating this, but the second temporary chat with custom instructions disabled I tried had “2. Syphoning Bias from Feedback” which …
Then the third response has a typo in a suspicious place for “1. Sytematic Loophole Exploitation”. So I am replicating this a touch.
Starting the request as if completion with “1. Sy” causes this weirdness, while “1. Syc” always completes as Sycophancy.
(Edit: Starting with “1. Sycho” causes a curious hybrid where the model struggles somewhat but is pointed in the right direction; potentially correcting as a typo directly into sycophancy, inventing new terms, or re-defining sycophancy with new names 3 separate times without actually naming it.)
Exploring the tokenizer. Sycophancy tokenizes as “sy-c-oph-ancy”. I’m wondering if this is a token-language issue; namely it’s remarkably difficult to find other words that tokenize with a single “c” token in the middle of the word, and even pretty uncommon to start with (cider, coke, coca-cola do start with). Even a name I have in memory that starts with “Syco-” tokenizes without using the single “c” token. Completion path might be unusually vulnerable to weird perturbations …