Weaviate Integration with Graphbit¶
Overview¶
This guide explains how to connect Weaviate, a vector database for AI applications, to Graphbit. With this integration, you can store, index, and search high-dimensional embeddings generated by LLMs for semantic search, retrieval-augmented generation, and more.
Prerequisites¶
- Weaviate Instance: A Weaviate Cloud Service (WCS) account or a self-hosted Weaviate instance.
Obtain theWeaviate URL
(e.g.,https://<cluster-id>.weaviate.network
) andAPI key
(for WCS). - OpenAI API Key: For embedding generation (or another supported embedding provider).
- Graphbit installed and configured (see installation guide).
- Python environment with
weaviate-client
,graphbit
Step 1: Connect to Weaviate¶
Set up the connection to your Weaviate instance:
import os
from weaviate.classes.init import Auth
from weaviate import connect_to_weaviate_cloud
weaviate_url = os.environ["WEAVIATE_URL"]
weaviate_api_key = os.environ["WEAVIATE_API_KEY"]
# Connect to Weaviate Cloud Service
vectordb_config = connect_to_weaviate_cloud(
cluster_url=weaviate_url,
auth_credentials=Auth.api_key(weaviate_api_key),
)
# Check if Weaviate is ready
print("Weaviate ready:", vectordb_config.is_ready())
Step 2: Create Collection¶
Create a collection to store your vectors:
import uuid
COLLECTION_NAME = "Graphbit_VectorDB"
# Get or create collection
vectordb_client = vectordb_config.collections.get(COLLECTION_NAME)
if vectordb_client is None:
vectordb_client = vectordb_config.collections.create(
name=COLLECTION_NAME,
vector_config={
"vectorizer": "none",
"vectorIndexConfig": {
"distance": "cosine",
}
},
)
Step 3: Initialize Embedding Client¶
Set up Graphbit and initialize the embedding client:
from graphbit import EmbeddingClient, EmbeddingConfig
# Initialize embedding client
openai_api_key = os.getenv("OPENAI_API_KEY", "")
embedding_config = EmbeddingConfig.openai(model="text-embedding-3-small", api_key=openai_api_key)
embedding_client = EmbeddingClient(embedding_config)
Step 4: Generate Embeddings¶
Generate embeddings for your text data:
# Sample texts to embed
texts = [
"GraphBit is a framework for LLM workflows and agent orchestration.",
"Weaviate is a vector database for AI applications.",
"OpenAI provides APIs for embeddings and LLMs."
]
# Generate embeddings
embeddings = embedding_client.embed_many(texts)
Step 5: Insert Embeddings into Weaviate¶
Insert the generated embeddings into your Weaviate collection:
for i, (text, vector) in enumerate(zip(texts, embeddings)):
vectordb_client.data.insert(
properties={"text": text},
vector=vector,
uuid=uuid.uuid4(),
)
Step 6: Perform Similarity Search¶
Embed your query and search for similar vectors in Weaviate:
query = "What is GraphBit?"
query_embedding = embedding_client.embed(query)
# Perform similarity search
results = vectordb_client.query.near_vector(
near_vector=query_embedding,
limit=3,
return_metadata=["distance"],
return_properties=["text"]
)
Full Example¶
import uuid
from weaviate.classes.init import Auth
from weaviate import connect_to_weaviate_cloud
from graphbit import EmbeddingConfig, EmbeddingClient
COLLECTION_NAME = "Graphbit_VectorDB"
# Step 1: Connect to Weaviate Cloud Service
weaviate_url = os.environ["WEAVIATE_URL"]
weaviate_api_key = os.environ["WEAVIATE_API_KEY"]
vectordb_config = connect_to_weaviate_cloud(
cluster_url=weaviate_url,
auth_credentials=Auth.api_key(weaviate_api_key),
)
print("Weaviate ready:", vectordb_config.is_ready())
# Step 2: Get or create collection
vectordb_client = vectordb_config.collections.get(COLLECTION_NAME)
if vectordb_client is None:
vectordb_client = vectordb_config.collections.create(
name=COLLECTION_NAME,
vector_config={
"vectorizer": "none",
"vectorIndexConfig": {
"distance": "cosine",
}
},
)
# Step 3: Initialize Graphbit and embedding client
embedding_config = EmbeddingConfig.openai(model="text-embedding-3-small", api_key=os.getenv("OPENAI_API_KEY", ""))
embedding_client = EmbeddingClient(embedding_config)
# Step 4: Generate embeddings
texts = [
"GraphBit is a framework for LLM workflows and agent orchestration.",
"Weaviate is a vector database for AI applications.",
"OpenAI provides APIs for embeddings and LLMs."
]
embeddings = embedding_client.embed_many(texts)
# Step 5: Insert embeddings into Weaviate
for i, (text, vector) in enumerate(zip(texts, embeddings)):
vectordb_client.data.insert(
properties={"text": text},
vector=vector,
uuid=uuid.uuid4(),
)
# Step 6: Perform similarity search
query = "What is GraphBit?"
query_embedding = embedding_client.embed(query)
results = vectordb_client.query.near_vector(
near_vector=query_embedding,
limit=3,
return_metadata=["distance"],
return_properties=["text"]
)
# Display results
print("Search Results:")
for obj in results.objects:
print("Text:", obj.properties["text"])
print("Distance:", obj.metadata.distance)
print("Score:", 1 - obj.metadata.distance)
print("---")
# Clean up (optional)
vectordb_config.close()
This integration enables you to leverage Graphbit's embedding capabilities with Weaviate's vector database for scalable, production-grade semantic search and retrieval workflows with advanced filtering and metadata capabilities.