Overview
Make your avatars talk by sending audio through a WebSocket connection. We strongly recommend using our LiveKit or Pipecat integrations to make implementation easier and more reliable. These plugins handle all the complex WebSocket and audio streaming logic for you. For advanced use cases requiring full control, you can also build a direct integration with our WebSocket API.Quick Start: How It Works
Three Simple Steps:- Create/Start a session → Get
session_idandsession_token - Connect to WebSocket →
wss://wss.agenthuman.comwithsession.init - Send audio → Avatar generates video and streams to your room
The avatar automatically joins the video room you specify during session creation. You receive the video through your Daily or LiveKit room.
Step-by-Step Implementation
Step 1: Create and Start a Session
First, create a session with your video room configuration:Room Configuration: You must provide your own Daily or LiveKit room. The avatar will automatically join this room when you initialize the WebSocket connection.
Step 2: Connect to WebSocket
Connect towss://wss.agenthuman.com and initialize with your session credentials:
Step 3: Send Audio to Make Avatar Talk
Send audio as base64-encoded, 16-bit mono PCM:| Parameter | Value | Description |
|---|---|---|
| Format | Raw PCM | Raw audio bytes (NOT WAV file) |
| Bit Depth | 16-bit signed | Audio sample format |
| Channels | Mono | Single channel only |
| Sample Rate | Configurable | Common: 16000, 48000 Hz |
| Encoding | Base64 | String encoding for JSON |
| Max Size | 10 MB | Per WebSocket message |
Sample Rate: You can send audio at any sample rate (commonly 16kHz or 48kHz). Just make sure to set the
sample_rate field to match your audio. The avatar server will handle any necessary resampling.Step 4: Receive Confirmations
The server sends confirmation events:Step 5: View the Avatar Video
The avatar video is streamed to your Daily or LiveKit room. Join the room to see the avatar:Complete Example: Browser Client
Here’s a complete working example:WebSocket API Reference
Client → Server Messages
| Message Type | Description |
|---|---|
session.init | Initialize session (must be first message) |
agent.speak | Send audio to make avatar talk |
agent.interrupt | Stop current playback |
Server → Client Messages
| Event Type | Description |
|---|---|
connection.established | Session initialized successfully |
agent.speak.confirmed | Audio received and queued |
agent.interrupt.confirmed | Playback interrupted |
error | Error occurred |
Best Practices
Audio Quality
Use Clean Audio
Use Clean Audio
- Remove background noise before sending
- Normalize audio levels to prevent clipping
- Use high-quality recordings for best results
Optimal Chunk Sizes
Optimal Chunk Sizes
- Send 5-10 second audio chunks for smooth playback
- Don’t send too many small chunks (< 1 second)
- Don’t send extremely long audio (> 30 seconds) in one message
Format Validation
Format Validation
- Always verify 16-bit mono PCM format
- Set correct
sample_rate(e.g., 48000) - Test with sample audio first
Connection Management
Reconnection Strategy
Reconnection Strategy
Implement exponential backoff for reconnections:
Keep-Alive
Keep-Alive
WebSocket connections stay alive automatically. If you need to detect disconnections:
Session Lifecycle
Session Startup
Session Startup
- Create session via REST API
- Join your video room
- Connect WebSocket
- Send
session.init - Wait for
connection.established - Start sending audio
Session Cleanup
Session Cleanup
When finished:
- Stop sending audio
- Close WebSocket connection
- Leave video room
- End session via REST API
Troubleshooting
Common Issues
Error: First message must be session.init
Error: First message must be session.init
Cause: You sent a message before initializing the session.Solution: Always send
session.init as the first message after connecting.Error: Failed to process audio
Error: Failed to process audio
Causes:
- Wrong audio format (not 16-bit mono PCM)
- Incorrect
sample_ratespecified - Invalid base64 encoding
- Verify audio format: 16-bit, mono, PCM
- Set correct
sample_rate(e.g., 48000 for 48kHz audio) - Test base64 encoding/decoding
- Check for audio data corruption
Avatar not appearing in room
Avatar not appearing in room
Causes:
- Room credentials are incorrect
- Avatar hasn’t finished joining yet
- Network issues
- Verify room URL and token are correct
- Wait 2-3 seconds after
connection.established - Check room participant events
- Ensure room has available capacity
No video after sending audio
No video after sending audio
Causes:
- Audio format incorrect
- Haven’t received
agent.speak.confirmed - Video generation in progress
- Check for error events from server
- Wait for
agent.speak.confirmed - Allow 1-2 seconds for video generation
- Check video room connection
Next Steps
Pipecat Integration
Plug-and-play integration for Pipecat voice AI pipelines
LiveKit Integration
Native plugin for LiveKit real-time communication
Complete Examples
Full working code for browser, Python, and Node.js
Client Messages
Detailed WebSocket message format reference
Server Messages
Server event types and handling
Best Practices
Optimization tips and production guidelines
Support
Need help? Our team is here to assist:Email Support
[email protected]Response within 24 hours
API Status
status.agenthuman.comReal-time API monitoring