I’m not sure the deletions are a learnt behavior—base models, or at least llama 405b in particular, do this too IME (as does the fine-tuned 8b version).
I’m not sure the deletions are a learnt behavior—base models, or at least llama 405b in particular, do this too IME (as does the fine-tuned 8b version).