Github Search API Integration with Graphbit¶

Overview¶

This guideline explains how to connect the Github Search API to Graphbit, enabling Graphbit to orchestrate the retrieval, processing, and utilization of web search results in your AI workflows. This integration allows you to automate research, enrich LLM prompts, and build intelligent pipelines that leverage real-time web data.

Prerequisites¶

Github Token (Not Mandatory): Obtain from Github Personal Access Tokens.
OpenAI API Key: For LLM summarization (or another supported LLM provider).
Graphbit installed and configured (see installation guide).
Python environment with requests, python-dotenv, and graphbit installed.

.env file in your project root with the following variables:

GITHUB_TOKEN=your_github_token_key_here
OPENAI_API_KEY=your_openai_api_key_here

Step 1: Implement the Github Search Connector¶

Define a function to query the Github Search API, loading credentials from environment variables:

import requests
import os
from dotenv import load_dotenv

load_dotenv()
GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

def github_search(query, sort="stars", order="desc", per_page=50):
    """
    Search GitHub repositories
    Args:
        query: Search query (e.g., "machine learning", "language:python stars:>1000")
        sort: Sort by 'stars', 'forks', 'help-wanted-issues', 'updated'
        order: 'asc' or 'desc'
        per_page: Number of results per page (max 100)

    Returns:
        dict: JSON response with search results
    """
    url = "https://api.github.com/search/repositories"

    if GITHUB_TOKEN:
        headers = {
            "Accept": "application/vnd.github.v3+json",
            "Authorization": f"token {GITHUB_TOKEN}",
        }
    else:
        headers = {
            "Accept": "application/vnd.github.v3+json",
        }

    params = {"q": query, "sort": sort, "order": order, "per_page": per_page}
    response = requests.get(url, headers=headers, params=params)
    response.raise_for_status()

    return response.json()

Step 2: Process the Search Results¶

Extract relevant information (title, link, and snippet) from the search results for downstream use. By default, only the top 3 results are included, but you can override this by specifying the max_snippets parameter:

def process_search_results(results, max_snippets=10):
    """
    Extracts up to max_snippets search results (default: 3) as formatted strings.
    """
    items = results.get("items", [])[:max_snippets]
    snippets = [
        f"{item['full_name']} ({item['url']}): {item['stargazers_count']} stars"
        for item in items
    ]
    return "\n\n".join(snippets)

If you call process_search_results(results), it will use the default of 10 results.
To use a different number, call process_search_results(results, max_snippets=20) (for example).

Step 3: Build the Graphbit Workflow¶

Run the github Search and process the results:

search_results = github_search("python")
snippets_text = process_search_results(search_results, max_snippets=10)

Create a Graphbit agent node for summarization:

from graphbit import Node, Workflow

agent = Node.agent(
    name="Summarizer",
    prompt=f"Summarize these search results: {snippets_text}"
)
workflow = Workflow("Github Search Workflow")
workflow.add_node(agent)

Step 4: Orchestrate and Execute with Graphbit¶

Initialize Graphbit and configure your LLM:

from graphbit import LlmConfig, Executor
from dotenv import load_dotenv
import os
load_dotenv()
llm_config = LlmConfig.openai(os.getenv("OPENAI_API_KEY"))
executor = Executor(llm_config)

Run the workflow and retrieve the summary:

result = executor.execute(workflow)
if result.is_success():
    print("Summary:", result.get_node_output("Summarizer"))
else:
    print("Workflow failed:", result.state())

Full Example¶

import requests
from graphbit import Node, Workflow, LlmConfig, Executor
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()
GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

def github_search(query, sort="stars", order="desc", per_page=50):
    """
    Search GitHub repositories
    Args:
        query: Search query (e.g., "machine learning", "language:python stars:>1000")
        sort: Sort by 'stars', 'forks', 'help-wanted-issues', 'updated'
        order: 'asc' or 'desc'
        per_page: Number of results per page (max 100)

    Returns:
        dict: JSON response with search results
    """
    url = "https://api.github.com/search/repositories"

    if GITHUB_TOKEN:
        headers = {
            "Accept": "application/vnd.github.v3+json",
            "Authorization": f"token {GITHUB_TOKEN}",
        }
    else:
        headers = {
            "Accept": "application/vnd.github.v3+json",
        }

    params = {"q": query, "sort": sort, "order": order, "per_page": per_page}
    response = requests.get(url, headers=headers, params=params)
    response.raise_for_status()

    return response.json()

def process_search_results(results, max_snippets=10):
    items = results.get("items", [])[:max_snippets]
    snippets = [
        f"{item['title']} ({item['link']}): {item['snippet']}"
        for item in items
    ]
    return "\n\n".join(snippets)

search_results = github_search("python")
snippets_text = process_search_results(search_results, max_snippets=10)

agent = Node.agent(
    name="Summarizer",
    prompt=f"Summarize these search results: {snippets_text}"
)
workflow = Workflow("Github Search Workflow")
workflow.add_node(agent)

llm_config = LlmConfig.openai(OPENAI_API_KEY)
executor = Executor(llm_config)

result = executor.execute(workflow)
if result.is_success():
    print("Summary:", result.get_node_output("Summarizer"))
else:
    print("Workflow failed:", result.state())

This connector pattern enables you to seamlessly blend external web data into your AI workflows, orchestrated by Graphbit.