Running the Examples

The framework includes full-featured demos for both Gemini and OpenAI, with multiple agents, tools, image generation, and a web client with audio playback.

Prerequisites

Node.js 22+
A Google API key with Gemini Live API access, or an OpenAI API key
Chrome (recommended for the web client)

Start the Agent Server (Gemini)

bash

# Set your API key
export GEMINI_API_KEY="your-key-here"

# Start the voice agent
pnpm tsx examples/gemini-realtime-tools.ts

You should see:

============================================================
Bodhi Realtime Agent — Gemini Voice Assistant
============================================================

  WebSocket audio server: ws://localhost:9900
  Session ID: session_1234567890

Connect a WebSocket audio client and try saying:
  - 'What time is it?'
  - 'What is 25 times 17?'
  - 'I need help with complex math' (transfers to math expert)
  - 'What's the weather in San Francisco?' (uses Google Search)
  - 'Use slow search for AI news'
  - 'Speak slower please' (changes speech speed)
  - 'Generate an image of a sunset' (creates and displays image)
  - Try speaking in any language — the assistant replies in kind

Press Ctrl+C to stop.
============================================================

Start the Agent Server (OpenAI)

Alternatively, run the same tools and agents with OpenAI's Realtime API:

bash

# Set your API key
export OPENAI_API_KEY="your-key-here"
export GEMINI_API_KEY="your-gemini-key" # background subagents and image tools

# Start the voice agent
pnpm tsx examples/openai-realtime-tools.ts

The OpenAI example has the same tools (calculator, current time, image generation, slow search) and agents (main, math expert) as the Gemini example. The web client works with either server without changes — audio format is negotiated automatically via session.config.

Start the Web Client

In a second terminal:

bash

pnpm tsx examples/web-client.ts

Open http://localhost:8080 in Chrome and click Connect.

Cartesia TTS Demo

Use this demo when you want Gemini to generate text while Cartesia synthesizes the spoken audio:

bash

export GEMINI_API_KEY="your-gemini-key"
export CARTESIA_API_KEY="your-cartesia-key"
pnpm tsx examples/cartesia-tts-demo.ts

Start examples/web-client.ts in another terminal and connect as usual. The session config still advertises framework PCM audio to the browser; VoiceSession handles provider output format negotiation and resampling.

Twilio Demos

Outbound human transfer:

bash

export GEMINI_API_KEY="your-gemini-key"
export TWILIO_ACCOUNT_SID="ACxxxxxxxx"
export TWILIO_AUTH_TOKEN="xxxxxxxx"
export TWILIO_FROM_NUMBER="+1xxxxxxxxxx"
export HUMAN_AGENT_PHONE="+1xxxxxxxxxx"
export TWILIO_WEBHOOK_URL="https://xxxx.ngrok-free.app"
pnpm tsx examples/twilio-demo.ts

Inbound phone caller to an already running agent:

bash

export TWILIO_WEBHOOK_URL="https://xxxx.ngrok-free.app"
export AGENT_WS_URL="ws://localhost:9900"
pnpm tsx examples/twilio-inbound-bridge.ts

See Telephony for Twilio webhook and media-stream details.

Things to Try

Say this	What happens
"What time is it?"	Calls `get_current_time` tool, speaks the result
"What is 25 times 17?"	Calls `calculate` tool with the expression
"I need help with complex math"	Transfers to the `math_expert` agent
"Transfer me back" (with math expert)	Returns to the `main` agent
"What's the weather in Tokyo?"	Uses Google Search grounding for real-time data
"Generate an image of a sunset"	Calls Imagen API, image appears in browser
"Speak slower please"	Adjusts playback rate via `set_speech_speed` tool
"Use slow search for AI news"	Demonstrates 3-second slow tool (agent keeps talking)
Speak in any language	The assistant automatically replies in the same language

Agents in the Demo

Main Assistant

The default agent with access to all tools. Handles general conversation and routes specialized requests to other agents.

Math Expert

A specialist with a professorial tone. Activated when you ask for help with complex math. Has the calculator tool and can transfer back to main.

Multilingual

The main assistant is multilingual by default — speak in any language and it replies in kind. No separate agent needed; Gemini's native audio model handles language detection and response automatically.

Environment Variables

Variable	Default	Description
`GEMINI_API_KEY`	—	Google AI Studio API key (for Gemini example)
`OPENAI_API_KEY`	—	OpenAI API key (for OpenAI example)
`PORT`	`9900`	WebSocket port for the agent server
`CLIENT_PORT`	`8080`	HTTP port for the web client
`WS_URL`	`ws://localhost:9900`	Agent WebSocket URL (used by web client)

Troubleshooting

No audio playback

Make sure you click the Connect button directly — Chrome requires a user gesture to enable audio. Check the debug log for playChunk error messages.

No user transcription

User speech transcription uses a two-layer approach:

Chrome STT (client-side) — the browser's SpeechRecognition API provides real-time interim text during speech. Only works in Chrome with a working microphone.
Server STT (server-side) — GeminiBatchSTTProvider produces the authoritative transcript after each turn, replacing the Chrome STT text.

If you see no transcription at all, check that you're using Chrome and that microphone access is granted. The server STT transcript appears after the model starts responding (not during speech).

Agent doesn't call tools

Check the server terminal for [Hook] Tool called: messages. If no tool calls appear, try being more explicit: "Use the calculator to compute 25 times 17" instead of just "25 times 17".

Transfer not triggering

The model must invoke the transfer_to_agent function call, not just say it verbally. If transfers aren't happening, check the server logs for [Hook] Agent transfer:.

Image generation fails

Image generation uses the Imagen API (imagen-3.0-generate-002). Some API keys may not have access. Check the server logs for [Tool] Imagen failed messages.

Claude Code Demo — Voice-Driven Coding

A voice assistant backed by Claude Code (Anthropic's AI coding agent). This demo uses the relay subagent pattern — an interactive subagent bridges the voice model to Claude Code's stateful SDK, managing session lifecycles and question relay.

Prerequisites

A Google API key (Gemini Live API)
An Anthropic API key (Claude Agent SDK)
macOS (for Apple Mail email integration)

Start the Agent Server

bash

export GEMINI_API_KEY="your-gemini-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export PROJECT_DIR="/path/to/your/project"  # optional, defaults to cwd

pnpm tsx examples/claude_code/claude-demo.ts

You should see:

============================================================
Bodhi + Claude Code — Voice-Driven Coding Assistant
============================================================

  Voice agent:     ws://localhost:9900
  Project dir:     /path/to/your/project
  Session ID:      session_1234567890

Start the web client in another terminal:
  pnpm tsx examples/openclaw/web-client.ts

Then open http://localhost:8080 and try saying:
  - 'Fix the bug in auth.py'                 (Claude Code)
  - 'Add input validation to the login form' (Claude Code)
  - 'Run the tests and fix any failures'     (Claude Code)
  - 'Email me a summary of the README'       (Email via Mail.app)
  - 'What is the weather in San Francisco?'  (Google Search)
  - 'Draw me a picture of a sunset'          (Image generation)
  - 'Goodbye'
============================================================

Start the Web Client

Same as the Gemini demo — the web client is shared:

bash

pnpm tsx examples/openclaw/web-client.ts

Open http://localhost:8080 in Chrome and click Connect.

Things to Try

Say this	What happens
"Fix the bug in auth.py"	Claude reads files, edits code, runs tests
"Add input validation to the login form"	Claude creates or modifies files
"Run the tests and fix any failures"	Claude runs bash commands and iterates
"Summarize the README and email it to user@example.com"	Claude reads files and sends via Apple Mail
"What's the weather in San Francisco?"	Google Search (Gemini native)
"Draw me a picture of a sunset"	Image generation subagent
"Goodbye"	Graceful session end

How It Works

You speak a coding request → Gemini calls the ask_claude background tool
The relay subagent receives the task and calls claude_code_start
Claude Code runs autonomously: reads files, edits code, runs commands
If Claude asks a follow-up question, the relay calls ask_user → you hear the question and answer by voice
When Claude finishes, the relay summarizes the result → Gemini speaks it to you

Troubleshooting

Claude makes zero tool calls: Verify ANTHROPIC_API_KEY is set. Check server logs for [ClaudeCode] SDK init message: to confirm the SDK initialized correctly.

Email not sending: Requires macOS with Mail.app configured. First use triggers a system permission dialog. Check for [MCP:send_email] Tool invoked! in logs.

OpenClaw Demo — Dual Specialist Agents

A voice assistant backed by OpenClaw, a gateway for delegating tasks to Claude. This demo uses the relay subagent pattern with separate work and general-agent routes.

Prerequisites

A Google API key (Gemini Live API)
An OpenClaw gateway running locally (or remotely)
Device identity registered with the gateway (~/.bodhi/device-identity.json)

Start the Agent Server

bash

export GEMINI_API_KEY="your-gemini-key"
export OPENCLAW_TOKEN="your-openclaw-token"
export OPENCLAW_URL="ws://127.0.0.1:18789"  # optional, this is the default

pnpm tsx examples/openclaw/openclaw-demo.ts

Start the Web Client

bash

pnpm tsx examples/openclaw/web-client.ts

Open http://localhost:8080 in Chrome and click Connect.

Things to Try

Say this	What happens
"Reschedule my 3pm meeting to 4pm"	OpenClaw manages your calendar via Claude
"What's on my calendar tomorrow?"	Reads calendar (runs in parallel with other tasks)
"Email John a summary of the meeting notes"	Claude drafts and sends email
"Generate an image of a sunset"	Image generated and stored as artifact
"Email that image to me"	Artifact attached and sent via OpenClaw
"Confirm that reschedule"	Classifier routes to the existing calendar session
Ask two things at once	Tasks run concurrently with session isolation

How Session Routing Works

You speak a request and Gemini calls the ask_openclaw background tool
The session classifier (LLM call, 3s timeout) decides: continue an existing session, or create a new one
The routing mutex serializes concurrent routing decisions to prevent races
The task manager acquires a semaphore slot and optional write lock
The relay subagent sends the message to the correct OpenClaw session
Results are delivered to you via voice when the current turn completes

Key Components

Session Registry — Tracks active sessions with recent conversation context, domain tags, and status
Session Classifier — LLM-driven routing with stale-thread reconciliation (max 1 retry)
Task Manager — Concurrency semaphore (max 10), per-domain write locks, thread TTL (10 min)
Artifact Registry — In-memory image storage for cross-tool data flow (generate then email)

Environment Variables

Variable	Default	Description
`GEMINI_API_KEY`	—	Google Gemini API key
`OPENCLAW_TOKEN`	—	OpenClaw gateway auth token
`OPENCLAW_URL`	`ws://127.0.0.1:18789`	OpenClaw gateway WebSocket URL
`PORT`	`9900`	WebSocket port for the agent server
`HOST`	`0.0.0.0`	Host to bind to

Troubleshooting

Session routing sends to wrong session: Check server logs for [Classifier] messages. The classifier has a 3-second timeout — slow LLM responses fall back to creating a new session. Increase classifierCap if you have many active sessions.

"All background agents are busy": The task manager semaphore (default 10 slots) is full. Wait for tasks to complete or increase maxConcurrent.

Image attachment fails: Artifacts must be under 5 MB (gateway limit). Check that the artifact hasn't expired (30-minute TTL). Use "list artifacts" to see available artifacts.

Device not authorized: Run openclaw gateway devices approve <device-id> on the gateway to authorize your device. Local connections auto-approve.

Running the Examples ​

Prerequisites ​

Start the Agent Server (Gemini) ​

Start the Agent Server (OpenAI) ​

Start the Web Client ​

Cartesia TTS Demo ​

Twilio Demos ​

Things to Try ​

Agents in the Demo ​

Main Assistant ​

Math Expert ​

Multilingual ​

Environment Variables ​

Troubleshooting ​

No audio playback ​

No user transcription ​

Agent doesn't call tools ​

Transfer not triggering ​

Image generation fails ​

Claude Code Demo — Voice-Driven Coding ​

Prerequisites ​

Start the Agent Server ​

Start the Web Client ​

Things to Try ​

How It Works ​

Troubleshooting ​

OpenClaw Demo — Dual Specialist Agents ​

Prerequisites ​

Start the Agent Server ​

Start the Web Client ​

Things to Try ​

How Session Routing Works ​

Key Components ​

Environment Variables ​

Troubleshooting ​

Running the Examples

Prerequisites

Start the Agent Server (Gemini)

Start the Agent Server (OpenAI)

Start the Web Client

Cartesia TTS Demo

Twilio Demos

Things to Try

Agents in the Demo

Main Assistant

Math Expert

Multilingual

Environment Variables

Troubleshooting

No audio playback

No user transcription

Agent doesn't call tools

Transfer not triggering

Image generation fails

Claude Code Demo — Voice-Driven Coding

Prerequisites

Start the Agent Server

Start the Web Client

Things to Try

How It Works

Troubleshooting

OpenClaw Demo — Dual Specialist Agents

Prerequisites

Start the Agent Server

Start the Web Client

Things to Try

How Session Routing Works

Key Components

Environment Variables

Troubleshooting