Can fast feedback loops on small models give important information about training large models? My guess is yes, but we’ll probably only know in retrospect if this was an important factor in reaching AGI.
Here’s an example: https://x.com/leloykun/status/1885640350368420160
Can fast feedback loops on small models give important information about training large models? My guess is yes, but we’ll probably only know in retrospect if this was an important factor in reaching AGI.
Here’s an example: https://x.com/leloykun/status/1885640350368420160