Overview
This reference documents the WebSocket events the avatar server sends to your client. These are primarily confirmations and status updates for your audio commands.What You’ll Receive
The avatar server sends simple JSON events to confirm:- ✅ Session initialization
- ✅ Audio received and queued
- ✅ Playback interrupted
- ❌ Errors
Important: Events are Confirmations Only
These WebSocket events are confirmations for your commands. Video appears in your Daily.co room automatically. Event Flow:- You send audio via WebSocket → Server confirms receipt
- Server generates video → Streams to Daily.co room
- You receive video from Daily.co (separate from WebSocket)
Event Format
All events are JSON-formatted with atype field:
Event Types
Connection Established
Sent after successful session initialization. Confirms that the server has loaded your session configuration and is ready to receive audio commands.session_id- Your session ID (echoed back)
session.init message.
Client Action: Ready to send audio commands. The avatar server will join the Daily.co room and begin streaming video there.
Example Handlers:
Speak Confirmation
Sent after receiving and queuing audio for video generation. Confirms the audio was received and provides the sample count.audio_samples- Number of audio samples queued for processing (after any resampling to 16kHz)
Interrupt Confirmation
Sent after processing an interrupt request. Confirms that current video generation and playback have been stopped.Error Event
Sent when an error occurs during processing. Contains detailed error information.error- Human-readable error description
Common Error Messages
| Error Message | Meaning | Solution |
|---|---|---|
First message must be session.init | Session init was not sent first | Send session.init immediately after connecting |
Invalid room platform | The room config is missing/invalid | Set room.platform to daily |
Invalid video dimensions | Non-positive or non-numeric dimensions | Use positive integers for video_width and video_height |
Invalid aspect ratio | Unsupported width/height ratio | Use one of: 18:9, 16:9, 5:3, 16:10, 3:2, 4:3, 1:1 (or portrait equivalents) |
Invalid session id - server not assigned or session id mismatch | Session was routed to the wrong server | Re-run Start Session and use the returned ws_uri |
Failed to fetch video path | Session config could not be fetched | Ensure the session is started and the access_token is correct |
Session not found | Session ID not recognized | Re-establish session with session.init |
Invalid JSON format | Malformed JSON received | Validate JSON structure before sending |
No audio data provided | Empty audio field | Include base64-encoded audio |
Failed to process audio: ... | Invalid audio format | Verify 16-bit mono PCM and set the correct sample_rate |
Unknown message type: ... | Unsupported message | Check message type is supported |
Event Handling Best Practices
1. Always Check Event Type
2. Serialize audio sends
agent.speak.confirmed does not include a correlation ID. For the simplest integration, send one agent.speak at a time and wait for the confirmation before sending the next chunk.
3. Handle Errors Gracefully
4. Wait for Confirmations
Don’t assume operations succeeded. Wait for confirmation events:- Learn about Client → Server Messages
- See Complete Examples
- Review Best Practices