Gerald Monroe comments on The algorithm isn’t doing X, it’s just doing Y.

Gerald Monroe 17 Mar 2023 0:07 UTC
1 point
0
Note that because the current llms use a particular inductive bias with a particular neural network architecture, and don’t yet have internal task memories but do it greedily from the prompt, this can make them better at some ways on a task and hugely worse in others.

So for “easy” tasks, it doesn’t matter if you use method X or Y, for hard tasks one method may scale much better than the others.
- Cleo Nardo 17 Mar 2023 0:09 UTC
  2 points
  0
  Parent
  Yep, I broadly agree. But this would also apply to multiplying matrices.
  - Gerald Monroe 17 Mar 2023 0:21 UTC
    1 point
    0
    Parent
    Sure. This I think was the more informed objection to LLM capabilities. They are “just” filling in text and can’t know anything humans don’t. I mean it turns out this can likely mean 0.1 percent human capability in EVERY domain at the same time is doable, but lack the architecture to directly learn beyond human ability.
    
    (Which isn’t true, if they embark on tasks on their own and learn from the I/O, such as solving every coding problem published or randomly generating software requirements and then tests and then code to satisfy the tests, they could easily exceed ability at that domain than all living humans)
    
    I mistakenly thought they would be limited to median human performance.
    - Cleo Nardo 17 Mar 2023 0:48 UTC
      5 points
      2
      Parent
      Yep, the problem is that the internet isn’t written by Humans, so much as written by Humans + The Universe. Therefore, GPT-N isn’t bounded by human capabilities.
      - Gerald Monroe 17 Mar 2023 1:37 UTC
        3 points
        0
        Parent
        Thanks. Interestingly this model explains why:
        
        It can play a few moves of chess from common positions—it’s worth the weights to remember those
        
        It can replicate the terminal text for many basic Linux commands—it’s worth the weights for that also.