Skip to content

FAISS Integration with Graphbit

Overview

This guide explains how to use Graphbit to generate embeddings and perform similarity search using FAISS (Facebook AI Similarity Search), a library for efficient similarity search and clustering of dense vectors. You can use FAISS to store, index, and search high-dimensional vectors for semantic search and retrieval-augmented generation.


Prerequisites

  • OpenAI API Key (or another supported embedding provider).
  • Graphbit installed and configured (see installation guide).
  • Python environment with faiss-cpu, numpy, graphbit, and optionally installed.

Step 1: Initialize Graphbit

Set up Graphbit:

from graphbit import EmbeddingConfig, EmbeddingClient

embedding_client = EmbeddingClient(
    EmbeddingConfig.openai(
        model="text-embedding-3-small",
        api_key="openai_api_key",
    )
)

Step 2: Generate Embeddings

Generate embeddings for your texts:

import numpy as np

texts = [
    "GraphBit is a framework for LLM workflows and agent orchestration.",
    "FAISS is a library for efficient similarity search and clustering of dense vectors.",
    "OpenAI offers tools for LLMs and embeddings."
]
embeddings = embedding_client.embed_many(texts)
embeddings = np.array(embeddings).astype('float32')

Step 3: Create FAISS Index

Create a FAISS index for similarity search:

import faiss

dimension = embeddings.shape[1]
index = faiss.IndexFlatIP(dimension)

Step 4: Add Embeddings to FAISS Index

Add the generated embeddings to the FAISS index:

index.add(embeddings)

Embed your query and search for similar vectors in FAISS:

query = "What is GraphBit?"
query_embedding = embedding_client.embed(query)
query_embedding = np.array(query_embedding).astype('float32').reshape(1, -1)

scores, indices = index.search(query_embedding, k=3)

Full Example

import faiss
import numpy as np
from graphbit import EmbeddingConfig, EmbeddingClient

embedding_client = EmbeddingClient(
    EmbeddingConfig.openai(
        model="text-embedding-3-small",
        api_key="openai_api_key",
    )
)

texts = [
    "GraphBit is a framework for LLM workflows and agent orchestration.",
    "FAISS is a library for efficient similarity search and clustering of dense vectors.",
    "OpenAI offers tools for LLMs and embeddings."
]
embeddings = embedding_client.embed_many(texts)
embeddings = np.array(embeddings).astype('float32')

dimension = embeddings.shape[1]
index = faiss.IndexFlatIP(dimension)
index.add(embeddings)

query = "What is GraphBit?"
query_embedding = embedding_client.embed(query)
query_embedding = np.array(query_embedding).astype('float32').reshape(1, -1)

scores, indices = index.search(query_embedding, k=3)

for idx, score in zip(indices[0], scores[0]):
    print(f"ID: doc_{idx}\nScore: {score:.4f}\nText: {texts[idx]}\n---")

This integration enables you to leverage Graphbit's embedding capabilities with FAISS for efficient, scalable semantic search and retrieval workflows.