Milvus Integration with Graphbit¶
Overview¶
This guide demonstrates how to connect Milvus, an open-source vector database, with Graphbit. Through this integration, you can store, index, and perform similarity search on high-dimensional embeddings generated by LLMs or embedding models. This enables use cases like semantic search, retrieval-augmented generation (RAG), and other AI-powered vector search workflows.
Prerequisites¶
- Milvus running locally or remotely (see Milvus documentation).
- OpenAI API Key: For embedding generation (or another supported embedding provider).
- Graphbit installed and configured (see installation guide).
- Python environment with
pymilvus
,graphbit
installed.
Step 1: Configure Graphbit Embedding¶
Configure Graphbit Embedding:
import os
from graphbit import EmbeddingConfig, EmbeddingClient
openai_api_key = os.getenv("OPENAI_API_KEY", "")
embedding_client = EmbeddingClient(
EmbeddingConfig.openai(
model="text-embedding-3-small",
api_key=openai_api_key,
)
)
Step 2: Generate Embeddings¶
Generate Embeddings:
texts = [
"GraphBit is a framework for LLM workflows and agent orchestration.",
"FAISS is a library for efficient similarity search and clustering of dense vectors.",
"OpenAI offers tools for LLMs and embeddings."
]
embeddings = embedding_client.embed_many(texts)
Step 3: Initialize Milvus Client and Create Collection¶
Set up the Milvus client and ensure the collection is created:
from pymilvus import MilvusClient
vectordb_client = MilvusClient("graphbit_vector.db")
dimension = len(embeddings[0])
collection_name = "graphbit_collection"
if vectordb_client.has_collection(collection_name=collection_name):
vectordb_client.drop_collection(collection_name=collection_name)
vectordb_client.create_collection(collection_name, dimension=dimension)
Step 4: Insert Data into Milvus¶
Prepare and insert your embeddings with metadata into Milvus:
vectors = [
{"id": i, "vector": embedding, "text": texts[i], "subject": "graphbit"}
for i, embedding in enumerate(embeddings)
]
vectordb_client.insert(collection_name=collection_name, data=vectors)
Step 5: Perform Similarity Search¶
Embed your query and search for similar vectors in Milvus:
queries = ["What is GraphBit?", "What is Milvus?"]
query_embeddings = embedding_client.embed_many(queries)
search_results = vectordb_client.search(
collection_name=collection_name,
data=query_embeddings,
limit=3,
output_fields=["text"],
)
for idx, result in enumerate(search_results):
print(f"Query {idx}: {queries[idx]}")
for item in result:
print(f" id: {item['id']}, Text: {item.get('entity', {}).get('text', '')}, Score: {item.get('distance', 0):.4f}")
Full Example¶
import os
from pymilvus import MilvusClient
from graphbit import EmbeddingConfig, EmbeddingClient
openai_api_key = os.getenv("OPENAI_API_KEY", "")
embedding_client = EmbeddingClient(
EmbeddingConfig.openai(
model="text-embedding-3-small",
api_key=openai_api_key,
)
)
texts = [
"GraphBit is a framework for LLM workflows and agent orchestration.",
"Milvus is an open-source vector database for scalable similarity search and AI applications.",
"OpenAI offers tools for LLMs and embeddings."
]
embeddings = embedding_client.embed_many(texts)
vectordb_client = MilvusClient("graphbit_vector.db")
dimension = len(embeddings[0])
collection_name = "graphbit_collection"
if vectordb_client.has_collection(collection_name=collection_name):
vectordb_client.drop_collection(collection_name=collection_name)
vectordb_client.create_collection(collection_name, dimension=dimension)
vectors = [
{"id": i, "vector": embedding, "text": texts[i]} for i, embedding in enumerate(embeddings)
]
vectordb_client.insert(collection_name=collection_name, data=vectors)
# Query
queries = ["What is GraphBit?", "What is Milvus?"]
query_embeddings = embedding_client.embed_many(queries)
search_results = vectordb_client.search(
collection_name=collection_name,
data=query_embeddings,
limit=3,
output_fields=["text"],
)
for idx, result in enumerate(search_results):
print(f"Query {idx}: {queries[idx]}")
for item in result:
print(f" id: {item['id']}, Text: {item.get('entity', {}).get('text', '')}, Score: {item.get('distance', 0):.4f}")
Key Features¶
- Local Storage: Milvus can run locally with file-based storage (
graphbit_vector.db
) - Metadata Support: Store additional fields like text content and subject tags
- Flexible Queries: Search with custom limits and output field selection
- Automatic Dimension Detection: Collection dimension is automatically set based on embedding size
- Clean Slate Option: Drop and recreate collections for fresh starts