On the bright side, smart folks have already thought pretty hard about this. In my work, I ended up picking usearch for large-scale vector storage and ANN search. It's plenty fast and is happy working with vectors on disk - solutions which are /purely/ concerned with latency often don't include support for vectors on disk, which forces you into using a hell of a lot of RAM.
But here the problem is: the scale. Billions of vectors. And I wonder if Redis should add on-disk vector sets, which I started to sketch months ago and never implemented. So my question is, the "3B" in Vicky's blog post is theoretical or is a practical need many folks have? I'm asking because at such a scale, the main problem is to generate the embeddings for your items, whatever they are.
https://gist.github.com/antirez/b3cc9af4db69b04756606ad91cab...
EDIT: I wonder if it is possible to use in memory vector sets to index discrete on disk dense blobs of nearby vectors to query with an approach like the one described in the post. It's like a H-HNSW, and resembles to certain on-disk approaches for vector similarity indeed.
Generation is often decoupled from querying, though. Consider LLMs, where training is a very expensive, slow, hardware intensive process, whereas inference is much faster and much less intensive.
But the performance of inference is in many ways more important than the performance of training, because inference is what users interact with directly.
Classic software engineer pitfall. First gather the requirements!
Second, if their initial interpretation was correct, and it's a one-shot operation, then the initial solution solves it. Done! Why go any further?
I get that it's fun to muse over solutions to these types of problems but the absurdity of it all made me laugh. Jeff's answer was the best, because it describes a solution which makes the assumptions crystal clear while outlining a straightforward implementation. If you wanted something else, it's obvious you need to clarify.
If you can learn to get past this you can unlock a whole universe of problem solving.
https://scour.ing
https://emschwartz.me/binary-vector-embeddings-are-so-cool/