Introduction

Bodhi Realtime Agent Framework is a TypeScript framework for building real-time voice agent applications. It supports multiple LLM providers through a unified LLMTransport interface — currently Google Gemini Live API and OpenAI Realtime API. It handles the hard parts of voice AI — bidirectional audio streaming, turn detection, agent transfers, tool execution, and session management — so you can focus on what your agent actually does.

What You Can Build

Voice assistants with tools — An agent that answers questions, checks the weather, does math, and searches the web, all through natural conversation.
Multi-agent systems — A general assistant that transfers to a booking specialist, a math expert, or a language tutor mid-conversation.
Multimodal applications — Users can speak, type, upload images, and receive generated images — all on a single WebSocket connection.
Proactive notification agents — Service subagents that monitor calendars, inboxes, or IoT devices and notify the user when something needs attention.

Architecture Overview

Client App  <──WebSocket──>  ClientTransport  <──audio──>  LLMTransport  <──WebSocket──>  LLM Provider
                                    │                           │            (Gemini / OpenAI)
                                    └───────── VoiceSession ────┘
                                    │    (audio fast-path relay) │
                                    │                           │
                              AgentRouter    ToolExecutor    ConversationContext

Audio flows on a fast-path directly between the client and LLM transports, bypassing the EventBus for minimal latency. Everything else (tool calls, agent transfers, GUI events) goes through the control plane. The LLMTransport interface abstracts provider differences — your agent code is the same regardless of which LLM provider you use.

Key Concepts

Concept	What it does
VoiceSession	Top-level orchestrator that wires all components together
Agents	Personas with distinct instructions, tools, and lifecycle hooks
Tools	Functions the AI model can call during conversation (inline or background)
Memory	Automatic extraction of durable user facts across sessions
Events & Hooks	Type-safe EventBus and lifecycle callbacks for observability
Transport	Provider-agnostic LLM transport and client WebSocket connections

Prerequisites

Node.js 22+ — The framework uses modern JavaScript features
pnpm — Package manager (install guide)
LLM API key — A Google API key (get one) for Gemini Live, or an OpenAI API key for OpenAI Realtime

Next Steps

Quick Start — Build a working voice agent in 5 minutes
Running Examples — Try the built-in demo with tools, transfers, and image generation

Introduction ​

What You Can Build ​

Architecture Overview ​

Key Concepts ​