Skip to content

Introduction

Bodhi Realtime Agent Framework is a TypeScript framework for building real-time voice agent applications. It supports multiple LLM providers through a unified LLMTransport interface — currently Google Gemini Live API and OpenAI Realtime API. It handles the hard parts of voice AI — bidirectional audio streaming, turn detection, agent transfers, tool execution, and session management — so you can focus on what your agent actually does.

What You Can Build

  • Voice assistants with tools — An agent that answers questions, checks the weather, does math, and searches the web, all through natural conversation.
  • Multi-agent systems — A general assistant that transfers to a booking specialist, a math expert, or a language tutor mid-conversation.
  • Multimodal applications — Users can speak, type, upload images, and receive generated images — all on a single WebSocket connection.
  • Proactive notification agents — Service subagents that monitor calendars, inboxes, or IoT devices and notify the user when something needs attention.

Architecture Overview

Client App  <──WebSocket──>  ClientTransport  <──audio──>  LLMTransport  <──WebSocket──>  LLM Provider
                                    │                           │            (Gemini / OpenAI)
                                    └───────── VoiceSession ────┘
                                    │    (audio fast-path relay) │
                                    │                           │
                              AgentRouter    ToolExecutor    ConversationContext

Audio flows on a fast-path directly between the client and LLM transports, bypassing the EventBus for minimal latency. Everything else (tool calls, agent transfers, GUI events) goes through the control plane. The LLMTransport interface abstracts provider differences — your agent code is the same regardless of which LLM provider you use.

Key Concepts

ConceptWhat it does
VoiceSessionTop-level orchestrator that wires all components together
AgentsPersonas with distinct instructions, tools, and lifecycle hooks
ToolsFunctions the AI model can call during conversation (inline or background)
MemoryAutomatic extraction of durable user facts across sessions
Events & HooksType-safe EventBus and lifecycle callbacks for observability
TransportProvider-agnostic LLM transport and client WebSocket connections

Prerequisites

  • Node.js 22+ — The framework uses modern JavaScript features
  • pnpm — Package manager (install guide)
  • LLM API key — A Google API key (get one) for Gemini Live, or an OpenAI API key for OpenAI Realtime

Next Steps

  • Quick Start — Build a working voice agent in 5 minutes
  • Running Examples — Try the built-in demo with tools, transfers, and image generation

Built with VitePress