Qdrant Integration with Graphbit¶
Overview¶
This guide explains how to connect Qdrant, an open-source vector database, to Graphbit. With this integration, you can store, index, and search high-dimensional embeddings generated by LLMs for semantic search, retrieval-augmented generation, and more.
Prerequisites¶
- Qdrant running locally or remotely (see Qdrant documentation).
- OpenAI API Key: For embedding generation (or another supported embedding provider).
- Graphbit installed and configured (see installation guide).
- Python environment with
qdrant-client
,graphbit
, and optionallypython-dotenv
installed. - .env file in your project root with the following variables:
Step 1: Connect to Qdrant and Create Collection¶
Set up the Qdrant client and ensure the collection is created:
import os
import uuid
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
COLLECTION = "graphbit-vector"
DIMENSION = 1536
client = QdrantClient(host="localhost", port=6333)
if not client.collection_exists(COLLECTION):
client.create_collection(
collection_name=COLLECTION,
vectors_config=VectorParams(size=DIMENSION, distance=Distance.COSINE),
)
Step 2: Generate and Upsert Embeddings using Graphbit¶
Use Graphbit's embedding client to generate embeddings and upsert them into Qdrant:
from graphbit import EmbeddingClient, EmbeddingConfig
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
EMBEDDING_MODEL = "text-embedding-3-small"
embedding_client = EmbeddingClient(
EmbeddingConfig.openai(model=EMBEDDING_MODEL, api_key=OPENAI_API_KEY)
)
texts = [
"GraphBit is a framework for LLM workflows and agent orchestration.",
"Qdrant is an open-source vector database for similarity search.",
"OpenAI offers tools for LLMs and embeddings."
]
embeds = embedding_client.embed_many(texts)
points = [
PointStruct(id=str(uuid.uuid4()), vector=vec, payload={"text": txt})
for vec, txt in zip(embeds, texts)
]
client.upsert(collection_name=COLLECTION, points=points, wait=True)
Step 3: Perform Similarity Search¶
Embed your query and search for similar vectors in Qdrant:
query = "What is GraphBit?"
query_vec = embedding_client.embed(query)
response = client.query_points(
collection_name=COLLECTION,
query=query_vec,
limit=2,
with_payload=True
)
Full Example¶
import os
import uuid
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
from graphbit import EmbeddingClient, EmbeddingConfig
COLLECTION = "graphbit-vector"
DIMENSION = 1536
client = QdrantClient(host="localhost", port=6333)
if not client.collection_exists(COLLECTION):
client.create_collection(
collection_name=COLLECTION,
vectors_config=VectorParams(size=DIMENSION, distance=Distance.COSINE),
)
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
EMBEDDING_MODEL = "text-embedding-3-small"
embedding_client = EmbeddingClient(
EmbeddingConfig.openai(model=EMBEDDING_MODEL, api_key=OPENAI_API_KEY)
)
texts = [
"GraphBit is a framework for LLM workflows and agent orchestration.",
"Qdrant is an open-source vector database for similarity search.",
"OpenAI offers tools for LLMs and embeddings."
]
embeds = embedding_client.embed_many(texts)
points = [
PointStruct(id=str(uuid.uuid4()), vector=vec, payload={"text": txt})
for vec, txt in zip(embeds, texts)
]
client.upsert(collection_name=COLLECTION, points=points, wait=True)
# Retrieve one point to double-check (optional)
check_id = points[0].id
retrieved = client.retrieve(
collection_name=COLLECTION,
ids=[check_id],
with_payload=True
)
assert retrieved, "Upsert failed!"
query = "What is GraphBit?"
query_vec = embedding_client.embed(query)
response = client.query_points(
collection_name=COLLECTION,
query=query_vec,
limit=2,
with_payload=True
)
# Process response into a more readable format
response_points = response.points
result = []
for point in response_points:
result.append({
"id": point.id,
"text": point.payload.get("text", ""),
"score": point.score
})
print(result)
This integration enables you to leverage Graphbit's embedding capabilities with Qdrant's vector database for scalable, open-source semantic search and retrieval workflows.