Unlocking the Power of Vector Databases: A Comprehensive Summary of the Embeddings to Applications Short Course by Weaviate and Deeplearning.ai.

Ali Issa
2 min readFeb 19, 2024
Image generated by DALL-E 3

I completed DeepLearning.AI’s Vector Databases from Embeddings to Applications short course by Weaviate. Here’s a summary of the concepts covered :

πŸ”΅ How are embeddings being generated? We can take an example of a denoising variational auto-encoder applied with the MNIST dataset (a dataset containing handwritten digits from 0 to 9). The main idea behind it is that, given an input, the auto-encoder compresses the image by reducing its dimensionality in a latent space. The decoder then uses this representation to generate the original image. The representation in the latent space presents the embedding of our image because, from it, we can initialize the initial state of the image.

πŸ”΄ Distance metrics, such as Euclidean distance, Manhattan distance, dot product, and cosine distance. These metrics can be used to compute similarity between vectors, but cosine similarity is the most widely used metric.

πŸ”΅ 𝐊-𝐧𝐞𝐚𝐫𝐞𝐬𝐭 π§πžπ’π π‘π›π¨π« 𝐚π₯𝐠𝐨𝐫𝐒𝐭𝐑𝐦 (𝐊𝐍𝐍): finds the distance between a point and other points, and retrieves the K closest points. However, it can be computationally expensive, particularly with a large number of data points. Applying KNN to many documents takes a lot of time due to the brute force search, and increasing dimension size slows down computation even further.

πŸ”΄ ππšπ―π’π πšπ›π₯𝐞 𝐬𝐦𝐚π₯π₯ 𝐰𝐨𝐫π₯𝐝 (𝐍𝐒𝐖): is a method where, for a set of points, each point is connected to its nearest neighbors among those already added to the graph. This process creates a graph where points are connected based on their proximity.

πŸ”΅ π€π©π©π«π¨π±π’π¦πšπ­πž 𝐍𝐞𝐚𝐫𝐞𝐬𝐭 ππžπ’π π‘π›π¨π«(𝐀𝐍𝐍): begins by comparing a random entry to the query vector, checking which nearby point is the closest among connected candidates in a NSW graph. However, the accuracy may vary based on the starting node chosen.

πŸ”΄ 𝐇𝐍𝐒𝐖 (π‡π’πžπ«πšπ«πœπ‘π’πœπšπ₯ ππšπ―π’π πšπ›π₯𝐞 π’π¦πšπ₯π₯ 𝐖𝐨𝐫π₯𝐝): It starts from a random node and progresses layer by layer, selecting only from the current layer. The algorithm moves to the next level (layer) only after finding the nearest points within the current layer. This process continues until reaching the bottom layer, with each layer having distinct connections. The algorithm randomly assigns nodes to layers, with a lower probability of being in higher layers. This leads to encountering more nodes as we descend through layers. HNSW offers advantages such as faster query times and logarithmic increase in time with more points.

πŸ”΅ Dense search and sparse search: Dense search relies on word embeddings for semantic similarity, while sparse search depends more on word occurrence using techniques like Bag-of-Words, BM25, and TF-IDF. Semantic search in may struggle with unfamiliar words or specific queries like serial numbers, making sparse search beneficial in such cases. A hybrid approach combining both dense and sparse search can offer a balanced solution.