loops comments on Vladimir_Nesov’s Shortform

loops 9 Apr 2026 23:12 UTC
12 points
1

it will only be efficient to serve on TPUv7

FWIW Mythos Preview is available on Amazon Bedrock and Microsoft Foundry which don’t use TPUs (presumably at the same price as the first-party API?).
- gwern 10 Apr 2026 6:00 UTC
  9 points
  0
  Parent
  That’s not a real price. That’s just what they’re giving their partners as part of Glasswing, a charitable endeavour to try to stem the worst of the global damage, and is presumably more about encouraging the partners to economize on scarce Mythos tokens by avoiding setting the price to literally $0 (where people would be lazy and wasteful). It may or may not have much of anything to do with a ‘real’ price (whatever that means in a situation where hardware is so limited and demand so vast for what is an unpriceable ephemerally unique capability/possibility etc).
  - Vladimir_Nesov 14 Apr 2026 16:52 UTC
    2 points
    0
    Parent
    That’s not a real price.
    
    I believe a price that’s 5x the price of Opus 4 ^[1] is likely a real price (if you don’t simply mean they could’ve asked more since other AI companies are lagging behind, and there’s a compute shortage). Might even be a bit too high currently, and will very likely be too high in late 2026 (as better systems for inferencing it get online, and OpenAI releases its own Mythos counterpart). Under ideal conditions, when scale-up systems are big enough, the number of total params should barely influence costs of MoE model tokens, if everything else remains the same (the number of active params, memory per token of KV cache).
    
    ↩︎
    “Claude Mythos Preview will be available to participants at $25/$125 per million input/output tokens”. Current API price of Opus 4.6 is $5/$25 for input/output.
- Vladimir_Nesov 9 Apr 2026 23:54 UTC
  3 points
  0
  Parent
  GB300 NVL72 (but not GB200) would probably also do when serving via clouds, there’s just not a lot of it yet (when compared to everything else put together). But some GB300 might be available earlier in the clouds than TPUv7 for first-party API, so that’s a possibility. Also, the smaller rack-scale servers (GB200 NVL72, Trainium 2 Ultra, maybe there’ll be some Trainium 3 NL32x2 soon) won’t be 10x worse, just maybe 2x worse (if it’s a 10T+ param model deployed in FP8).