samuelshadrach comments on xpostah’s Shortform

samuelshadrach 17 Aug 2025 6:33 UTC
5 points
0
Open source offline search engine for entire internet is now affordable

OpenAI reduced text-embedding-3-large batch API pricing by 250x ! https://platform.openai.com/docs/pricing#embeddings

Assume entire internet plaintext is 2 PB, or 400T tokens. At $0.13/1B tokens you can embed entire internet plaintext for $68k.

see also: https://blog.wilsonl.in/search-engine/#live-demo

https://samuelshadrach.com/raw/text_english_html/my_research/open_source_search_summary.html ‎

Assume 1 KB plaintext → 200 tokens → 1536 float32 = 6 KB, then total storage for embeddings required = 12 PB.

Conclusion: Hosting cost = $24k / mo using hetzner

Cheaper if you build your own server. More expensive if you use hnsw index, for instance Qdrant uses approx 9 KB per vector. Cheaper if you quantise the embeddings.
- samuelshadrach 28 Aug 2025 10:25 UTC
  1 point
  0
  Parent
  Update: This pricing is a scam.
  
  OpenAI discussion on the same