That was my hunch too, and it’s why I switched to gemma 3 27b. Would be interesting to run this experiment on llama 3.3 70b. Could do this with a simple code change.
That was my hunch too, and it’s why I switched to gemma 3 27b. Would be interesting to run this experiment on llama 3.3 70b. Could do this with a simple code change.