[Question] What are some good language models to experiment with?

tailcalled10 Sep 2023 18:31 UTC

16 points

Like if I want to experiment with a steering technique, it would be useful to have a language model that is small, capable, but not so finetuned that it becomes inflexible. (Or maybe ideally, a model which has both a finetuned and a non-finetuned variant.)

I’ve seen some people use GPT-2. Is that recommended? Are there any alternatives?

tailcalled10 Sep 2023 18:31 UTC

16 points

3 comments1 min readLW link

LawrenceC 10 Sep 2023 23:06 UTC
6 points
5
If you care about having both the instruction-finetuned variant and the base model, I think I’d go with one of the smaller LLaMAs (7B/13B). Importantly, they fit on one ⁴⁰⁄₈₀ GB A100 comfortably, which saves a lot of hassle. There’s also a bajillion fine-tuned versions of them if you want to experiment.
Tao Lin 10 Sep 2023 21:15 UTC
5 points
4
Pythia is meant for this
- LawrenceC 10 Sep 2023 23:03 UTC
  2 points
  0
  Parent
  Aren’t the larger Pythias pretty undertrained?

No comments.