Something I’ve started to do is try to build toy models that exhibit certain large model behaviors. I suspect a lot of what the large models do can be trained in small models if we can figure out which part of the massive data sets creates the behavior we want.
Something I’ve started to do is try to build toy models that exhibit certain large model behaviors. I suspect a lot of what the large models do can be trained in small models if we can figure out which part of the massive data sets creates the behavior we want.
Thank you for the suggestion! I have found it a lot easier doing experiments with small models than I thought.