Overview
This reference documents the WebSocket messages you send to control the avatar server. These messages are simple JSON commands for audio playback, interruption and session management.WebSocket Connection
WebSocket URL:wss://wss.agenthuman.com
Connect to this URL after creating a session via the REST API. The WebSocket server will route your connection to the appropriate backend server.
Important: WebSocket is for Commands Only
The WebSocket connection is used exclusively for sending audio commands to the avatar server. It is NOT used for WebRTC signaling or video streaming. Video is delivered through your Daily or LiveKit room. Architecture:- Daily or LiveKit Room → Where you receive avatar video (WebRTC handled for you)
- WebSocket → Where you send audio commands (this reference)
Message Format
All messages must be valid JSON with atype field:
Message Types
1. Session Initialize
Initialize a new session with credentials. Must be sent first after connecting.session_id- Your Agent Human session ID (from session creation)session_token- Your session access token (from session creation)
- You send the
session.initmessage with your session ID and token - The server validates your credentials and fetches the session configuration from the API
- The session configuration (room details, avatar video, video dimensions) is retrieved server-side
- The avatar joins the Daily/LiveKit room automatically
- Server responds with
connection.established
connection.established on success.
Note: All session configuration (room platform, room URL, avatar video, video dimensions, etc.) is configured when you create the session via the REST API. You don’t need to send these details in the WebSocket message - the server fetches them automatically using your session credentials.
2. Send Audio for Video Generation
Send audio data to generate talking head video. Audio must be 16-bit, mono PCM raw bytes encoded in base64.audio- Base64-encoded raw PCM audio bytes
sample_rate- The sample rate of the PCM audio you’re sending. Defaults to16000. If you send 48kHz audio, setsample_rate: 48000.
- Sample Rate: 16kHz (16000 Hz) by default or provide
sample_rate(e.g. 48000) - Format: Raw PCM, 16-bit signed integer (not WAV file)
- Channels: Mono (single channel)
- Encoding: Base64 string
- Max Size: 10 MB per message
agent.speak.confirmed with the number of audio samples received.
Examples:
3. Interrupt Playback
Stop current video generation and playback immediately.agent.interrupt.confirmed.
Use Cases:
- User wants to skip current speech
- New urgent message needs to be displayed
- Cancel ongoing generation
4. Cleanup (End Session)
There is no WebSocketsession.stop message. When you’re done:
- Close the WebSocket connection
- Leave your video room
- Call the REST endpoint
POST /v1/sessions/{session_id}/endto end the session
Message Flow
Typical message sequence for an avatar session:- Create session → Call REST API
POST /v1/sessionsto create a session - Join your video room → Connect to Daily or LiveKit room to receive avatar video
- Connect WebSocket → Connect to
wss://wss.agenthuman.com - Send
session.init→ Server responds withconnection.established - Avatar joins video room → Avatar appears as participant
- Send
agent.speak(one or more times) → Server generates video and streams to your video room - Receive video from your room → Avatar video appears in the room
- Optional: Send
agent.interrupt→ Stop current playback - Close WebSocket and leave video room
- Call
POST /v1/sessions/{session_id}/end→ Release resources and mark the session ended
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
First message must be session.init | WebSocket initialized incorrectly | Send session.init as the first message after connecting |
Missing session id | The session_id field is missing | Include session_id in the session.init message |
Invalid session id - server not assigned or session id mismatch | Session was routed to the wrong server or server not assigned | Use the server_ws_uri provided when creating/starting the session |
Failed to fetch video path | Session configuration could not be retrieved from API | Ensure the session exists and the session_token is valid |
Session not found | Invalid or expired session | Create a new session and re-establish connection with session.init |
Invalid JSON format | Malformed JSON | Validate JSON structure before sending |
No audio data provided | Empty audio field | Include base64-encoded audio in agent.speak message |
Failed to process audio | Invalid audio format | Verify 16-bit mono PCM and set the correct sample_rate |
Unknown message type | Unsupported message type | Check message type is one of the documented types (session.init, agent.speak, agent.interrupt) |