Elasticsearch Integration with Graphbit¶
Overview¶
This guide explains how to integrate Elasticsearch with Graphbit to perform vector similarity search over high-dimensional embeddings generated by LLMs. This integration enables tasks like semantic search, retrieval-augmented generation, and more using Elasticsearch's capabilities.
Prerequisites¶
- Elasticsearch running locally or remotely (see Elasticsearch documentation).
- OpenAI API Key: For embedding generation (or another supported embedding provider).
- Graphbit installed and configured (see installation guide).
- Python environment with
elasticsearch
,graphbit
, and optionallypython-dotenv
installed. - .env file in your project root with the following variables:
Step 1: Initialize Elasticsearch¶
Set up the Elasticsearch client using the URL and API key:
from elasticsearch import Elasticsearch
import os
ELASTICSEARCH_URL = os.getenv("ELASTICSEARCH_URL", "http://localhost:9200")
ELASTICSEARCH_API_KEY = os.getenv("ELASTICSEARCH_API_KEY", "")
client = Elasticsearch(ELASTICSEARCH_URL, api_key=ELASTICSEARCH_API_KEY)
Step 2: Initialize Embedding Client¶
Set up the Graphbit embedding client to generate embeddings:
from graphbit import EmbeddingClient, EmbeddingConfig
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
embedding_config = EmbeddingConfig.openai(model="text-embedding-3-small", api_key=OPENAI_API_KEY)
embedding_client = EmbeddingClient(embedding_config)
Step 3: Generate Embeddings¶
Generate embeddings for a list of texts:
texts = [
"GraphBit is a framework for LLM workflows and agent orchestration.",
"Elasticsearch is a powerful full-text search engine.",
"OpenAI provides APIs for language and embedding models."
]
embeddings = embedding_client.embed_many(texts)
Step 4: Create Index with Vector Field¶
Create an Elasticsearch index with a dense vector field for storing embeddings:
if client.indices.exists(index="graphbit_vectordb"):
client.indices.delete(index="graphbit_vectordb")
index_body = {
"mappings": {
"properties": {
"text": {"type": "text"},
"embedding": {
"type": "dense_vector",
"dims": len(embeddings[0]),
"index": True,
"similarity": "cosine",
},
}
}
}
client.indices.create(index="graphbit_vectordb", body=index_body)
Step 5: Insert Vectors into Elasticsearch¶
Index the generated embeddings along with their corresponding texts:
import uuid
from elasticsearch.helpers import bulk
vectors = [
{
"_op_type": "index",
"_index": "graphbit_vectordb",
"_id": str(uuid.uuid4()),
"_source": {"text": text, "embedding": embedding},
}
for text, embedding in zip(texts, embeddings)
]
bulk(client, vectors)
Step 6: Embed Query Text¶
Generate an embedding for the query text:
Step 7: Perform Semantic Search¶
Perform a semantic search using Elasticsearch's KNN query:
response = client.search(
index="graphbit_vectordb",
knn={
"field": "embedding",
"k": 3,
"num_candidates": 10,
"query_vector": query_vector,
},
_source=["text"],
)
Full Example¶
Here is the complete example for integrating Elasticsearch with Graphbit:
import os
import uuid
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
from graphbit import EmbeddingClient, EmbeddingConfig
# Initialize Elasticsearch
ELASTICSEARCH_URL = os.getenv("ELASTICSEARCH_URL", "")
ELASTICSEARCH_API_KEY = os.getenv("ELASTICSEARCH_API_KEY", "")
client = Elasticsearch(ELASTICSEARCH_URL, api_key=ELASTICSEARCH_API_KEY)
# Initialize Embedding Client
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
embedding_config = EmbeddingConfig.openai(model="text-embedding-3-small", api_key=OPENAI_API_KEY)
embedding_client = EmbeddingClient(embedding_config)
# Generate Embeddings
texts = [
"GraphBit is a framework for LLM workflows and agent orchestration.",
"Elasticsearch is a powerful full-text search engine.",
"OpenAI provides APIs for language and embedding models."
]
embeddings = embedding_client.embed_many(texts)
# Create Index
if client.indices.exists(index="graphbit_vectordb"):
client.indices.delete(index="graphbit_vectordb")
index_body = {
"mappings": {
"properties": {
"text": {"type": "text"},
"embedding": {
"type": "dense_vector",
"dims": len(embeddings[0]),
"index": True,
"similarity": "cosine",
},
}
}
}
client.indices.create(index="graphbit_vectordb", body=index_body)
# Insert Vectors
vectors = [
{
"_op_type": "index",
"_index": "graphbit_vectordb",
"_id": str(uuid.uuid4()),
"_source": {"text": text, "embedding": embedding},
}
for text, embedding in zip(texts, embeddings)
]
bulk(client, vectors)
# Embed Query Text
query_text = "What is GraphBit?"
query_vector = embedding_client.embed(query_text)
# Perform Semantic Search
response = client.search(
index="graphbit_vectordb",
knn={
"field": "embedding",
"k": 3,
"num_candidates": 10,
"query_vector": query_vector,
},
_source=["text"],
)
# Display Results
for hit in response["hits"]["hits"]:
print(f"Score: {hit['_score']:.4f}, Text: {hit['_source']['text']}")
Troubleshooting¶
- Index Creation Fails: Ensure Elasticsearch is running and accessible.
- Vector Dimension Mismatch: Verify the embedding dimension matches the index mapping.
- Search Performance: Use appropriate
knn
settings for better performance on large datasets. - API Key Issues: Ensure your OpenAI and Elasticsearch API keys are correctly set in the
.env
file.
This guide demonstrates how to leverage Graphbit's embedding capabilities with Elasticsearch for scalable, production-grade semantic search workflows.