1. Concept
Vector library is a database for "vectors" (embeddings): it stores high-dimensional vectors converted from text, images, audio, and other data through embedding models, and provides nearest neighbor retrieval capabilities based on similarity (such as cosine, inner product, L2 distance). Compared with only doing exact matching keyword searches, vector libraries are good at finding "semantically similar" content for semantic search, recommendation, RAG and other tasks.
2. Why is it necessary
Traditional search can only match word faces; Vector search can understand "similar meanings". For example, when querying "healthy snacks", vector search can return semantically similar results such as "low-calorie snacks, granola bars" instead of text containing only the word "healthy". When docking with large models, the vector library can also retrieve the external knowledge most relevant to the problem back into the prompt, significantly reducing "fabrication".
3. Workflow (simplified version)
1) Use the embedding model to convert the data into vectors and write them to the vector library together with the metadata;
2) Indexing (commonly HNSW, IVF, PQ or Exact Flat) to balance speed, memory, and recall;
3) During the query, the problem is also encoded into vectors, and k nearest neighbor search (kNN/ANN) is done, and then the results are returned according to the score and filter conditions.
4) In search scenarios, "mixed search" can be done: BM25 keyword scores are fused with vector similarity to take into account relevance and recall.
4. Quick overview of key point
- indicators: often look at recall@k, latency (p95/99), throughput and cost.
- Distance measurement: Cosine and inner product are often used in text; L2 is commonly used for partial visual embeddings. Note if vector normalization is required.
- Index selection: HNSW or exact retrieval available on a small scale; When the amount of data is tens of millions, clustering and quantization techniques such as IVF/PQ are commonly used to save memory and speed up.
- Data and consistency: corpus updates need to be rebuilt/incrementally indexed; After the embedding model is upgraded, consider "reembedding" and versioning.
- Ecological form: There are both dedicated vector libraries (such as Milvus, Weaviate) and vector capabilities on general-purpose databases/search engines (such as PostgreSQL+pgvector, OpenSearch/Elastic).
5. Common applications
Semantic search, RAG retrieval augmented generation, similar content deduplication, recommendation recall, multimodal retrieval (graphic, text, audio and video), anomaly detection and metric learning, etc.