[Question] Vector search on a large dataset?

Does anyone have advice or could recommend books on how to accomplish vector search with large datasets?

What I’ve found so far is vector DB’s are very bad at larger datasets, even to the order of 10,000′s of vectors if the data is similar enough. Some ideas we’ve gone through so far:

  1. Smaller chunks & larger chunks → Spacy & others

  2. Summarizing chunks using the summarization as an embedding and the actual chunk as the metadata

  3. Traditional search with vector search on top ( this seems the best ) but still runs into issues with chunks cutting off at the wrong locations.

  4. Of course all the “traditional” systems like langchain etc.

Any help, ideas, or recommendations on where I can read would be very much appreciated!

No comments.