Introduction
Bodhi Realtime Agent Framework is a TypeScript framework for building real-time voice agent applications. It supports multiple LLM providers through a unified LLMTransport interface — currently Google Gemini Live API and OpenAI Realtime API. It handles the hard parts of voice AI — bidirectional audio streaming, turn detection, agent transfers, tool execution, and session management — so you can focus on what your agent actually does.
What You Can Build
- Voice assistants with tools — An agent that answers questions, checks the weather, does math, and searches the web, all through natural conversation.
- Multi-agent systems — A general assistant that transfers to a booking specialist, a math expert, or a language tutor mid-conversation.
- Multimodal applications — Users can speak, type, upload images, and receive generated images — all on a single WebSocket connection.
- Proactive notification agents — Service subagents that monitor calendars, inboxes, or IoT devices and notify the user when something needs attention.
Architecture Overview
Client App <──WebSocket──> ClientTransport <──audio──> LLMTransport <──WebSocket──> LLM Provider
│ │ (Gemini / OpenAI)
└───────── VoiceSession ────┘
│ (audio fast-path relay) │
│ │
AgentRouter ToolExecutor ConversationContextAudio flows on a fast-path directly between the client and LLM transports, bypassing the EventBus for minimal latency. Everything else (tool calls, agent transfers, GUI events) goes through the control plane. The LLMTransport interface abstracts provider differences — your agent code is the same regardless of which LLM provider you use.
Key Concepts
| Concept | What it does |
|---|---|
| VoiceSession | Top-level orchestrator that wires all components together |
| Agents | Personas with distinct instructions, tools, and lifecycle hooks |
| Tools | Functions the AI model can call during conversation (inline or background) |
| Memory | Automatic extraction of durable user facts across sessions |
| Events & Hooks | Type-safe EventBus and lifecycle callbacks for observability |
| Transport | Provider-agnostic LLM transport and client WebSocket connections |
Prerequisites
- Node.js 22+ — The framework uses modern JavaScript features
- pnpm — Package manager (install guide)
- LLM API key — A Google API key (get one) for Gemini Live, or an OpenAI API key for OpenAI Realtime
Next Steps
- Quick Start — Build a working voice agent in 5 minutes
- Running Examples — Try the built-in demo with tools, transfers, and image generation