In Qwen2.5-14B-Instruct_full-ft/config.json I see that "max_position_embeddings": 2048, while afaik Qwen2.5-14B-Instruct original context length is >30k. Is there a reason for this?
I am assuming its because you fine-tuned on shorter sequences, but did you guys test longer sequences and saw significant quality degradation? Anything else I should beware of while experimenting with these models?
Thanks for the paper, post, and models!
In
Qwen2.5-14B-Instruct_full-ft
/config.json
I see that"max_position_embeddings": 2048,
while afaikQwen2.5-14B-Instruct
original context length is >30k. Is there a reason for this?I am assuming its because you fine-tuned on shorter sequences, but did you guys test longer sequences and saw significant quality degradation? Anything else I should beware of while experimenting with these models?