Embeddings¶
GraphBit provides vector embedding capabilities for semantic search, similarity analysis, and other AI-powered text operations. This guide covers configuration and usage for working with embeddings.
Overview¶
GraphBit's embedding system supports: - Multiple Providers - OpenAI and HuggingFace embedding models - Unified Interface - Consistent API across all providers - Batch Processing - Efficient processing of multiple texts - Similarity Calculations - Built-in cosine similarity functions
Configuration¶
OpenAI Configuration¶
Configure OpenAI embedding provider:
import graphbit
import os
# Initialize GraphBit
graphbit.init()
# Basic OpenAI configuration
embedding_config = graphbit.EmbeddingConfig.openai(
api_key=os.getenv("OPENAI_API_KEY"),
model="text-embedding-3-small" # Optional - defaults to text-embedding-3-small
)
print(f"Provider: OpenAI")
print(f"Model: {embedding_config.model}")
HuggingFace Configuration¶
Configure HuggingFace embedding provider:
# HuggingFace configuration
embedding_config = graphbit.EmbeddingConfig.huggingface(
api_key=os.getenv("HUGGINGFACE_API_KEY"),
model="sentence-transformers/all-MiniLM-L6-v2"
)
print(f"Provider: HuggingFace")
print(f"Model: {embedding_config.model}")
Basic Usage¶
Creating Embedding Client¶
Single Text Embedding¶
Generate embeddings for individual texts:
# Embed single text
text = "GraphBit is a powerful framework for AI agent workflows"
vector = embedding_client.embed(text)
print(f"Text: {text}")
print(f"Vector dimension: {len(vector)}")
print(f"First 5 values: {vector[:5]}")
Batch Text Embeddings¶
Process multiple texts efficiently:
# Embed multiple texts
texts = [
"Machine learning is transforming industries",
"Natural language processing enables computers to understand text",
"Deep learning models require large datasets",
"AI ethics is becoming increasingly important",
"Transformer architectures revolutionized NLP"
]
vectors = embedding_client.embed_many(texts)
print(f"Generated {len(vectors)} embeddings")
for i, (text, vector) in enumerate(zip(texts, vectors)):
print(f"Text {i+1}: {text[:50]}...")
print(f"Vector dimension: {len(vector)}")
Similarity Calculations¶
Cosine Similarity¶
Calculate similarity between vectors:
# Generate embeddings for comparison
text1 = "Artificial intelligence and machine learning"
text2 = "AI and ML technologies"
vector1 = embedding_client.embed(text1)
vector2 = embedding_client.embed(text2)
# Calculate similarities
similarity_1_2 = graphbit.EmbeddingClient.similarity(vector1, vector2)
print(f"Similarity between text1 and text2: {similarity_1_2:.3f}")
Finding Most Similar Texts¶
def find_most_similar(query_text, candidate_texts, embedding_client, threshold=0.7):
"""Find most similar texts to a query"""
query_vector = embedding_client.embed(query_text)
candidate_vectors = embedding_client.embed_many(candidate_texts)
similarities = []
for i, candidate_vector in enumerate(candidate_vectors):
similarity = graphbit.EmbeddingClient.similarity(query_vector, candidate_vector)
similarities.append((i, candidate_texts[i], similarity))
# Sort by similarity (highest first)
similarities.sort(key=lambda x: x[2], reverse=True)
# Filter by threshold
results = [(text, sim) for _, text, sim in similarities if sim >= threshold]
return results
# Example usage
query = "machine learning algorithms"
candidates = [
"Deep learning neural networks",
"Supervised learning models",
"Recipe for chocolate cake",
"Natural language processing",
"Computer vision techniques",
"Sports news update"
]
similar_texts = find_most_similar(query, candidates, embedding_client, threshold=0.5)
print(f"Query: {query}")
print("Most similar texts:")
for text, similarity in similar_texts:
print(f"- {text} (similarity: {similarity:.3f})")
What's Next¶
- Learn about Performance for optimization techniques
- Explore Monitoring for production monitoring
- Check Validation for input validation strategies
- See LLM Providers for language model integration