faul_sname comments on Davey Morse’s Shortform

faul_sname 4 Mar 2025 20:12 UTC
4 points
0
With current architectures, no, because running inference on 1000 prompts in parallel against the same model is many times less expensive than running inference on 1000 prompts against 1000 models, and serving a few static versions of a large model is simpler than serving many dynamic versions of that mode.

It might, in some situations, be more effective but it’s definitely not simpler.

Edit: typo
- Davey Morse 4 Mar 2025 21:12 UTC
  1 point
  0
  Parent
  Makes sense for current architectures. The question’s only interesting, I think, if we’re thinking ahead to when architectures evolve.
  - faul_sname 4 Mar 2025 21:28 UTC
    2 points
    0
    Parent
    I think at that point it will come down to the particulars of how the architectures evolve—I think trying to philosophize in general terms about the optimal compute configuration for artificial intelligence to accomplish its goals is like trying to philosophize in general terms about the optimal method of locomotion for carbon-based life.
    
    That said I do expect “making a copy of yourself is a very cheap action” to persist as an important dynamic in the future for AIs (a biological system can’t cheaply make a copy of itself including learned information, but if such a capability did evolve I would not expect it to be lost), and so I expect our biological intuitions around unique single-threaded identity will make bad predictions.