Skip to main content

Overview

This reference documents the WebSocket events the avatar server sends to your client. These are primarily confirmations and status updates for your audio commands.

What You’ll Receive

The avatar server sends simple JSON events to confirm:
  • ✅ Session initialization
  • ✅ Audio received and queued
  • ✅ Playback interrupted
  • ❌ Errors

Important: Events are Confirmations Only

These WebSocket events are confirmations for your commands. Video appears in your Daily.co room automatically. Event Flow:
  1. You send audio via WebSocket → Server confirms receipt
  2. Server generates video → Streams to Daily.co room
  3. You receive video from Daily.co (separate from WebSocket)

Event Format

All events are JSON-formatted with a type field:
{
  "type": "event_type",
  "field1": "value1",
  "field2": "value2"
}

Event Types

Connection Established

Sent after successful session initialization. Confirms that the server has loaded your session configuration and is ready to receive audio commands.
{
  "type": "connection.established",
  "session_id": "your-session-id"
}
Fields:
  • session_id - Your session ID (echoed back)
When Sent: Immediately after receiving a valid session.init message. Client Action: Ready to send audio commands. The avatar server will join the Daily.co room and begin streaming video there. Example Handlers:
ws.onmessage = async (event) => {
    const message = JSON.parse(event.data);
    
    if (message.type === 'connection.established') {
        console.log('Session established:', message.session_id);
        // Avatar server will join Daily.co room
        // Ready to send audio commands
    }
};

Speak Confirmation

Sent after receiving and queuing audio for video generation. Confirms the audio was received and provides the sample count.
{
  "type": "agent.speak.confirmed",
  "audio_samples": 48000
}
Fields:
  • audio_samples - Number of audio samples queued for processing (after any resampling to 16kHz)
When Sent: After successfully receiving and queuing your audio data. Client Action: Optional - you can use this to track which audio chunks have been received and are being processed. Example Handlers:
if (message.type === 'agent.speak.confirmed') {
    console.log(`Audio confirmed: ${message.audio_samples} samples`);
    console.log(`Duration: ${message.audio_samples / 16000} seconds (approx)`);
}
Note: Video generation happens in the background and is streamed to the Daily.co room. This event only confirms audio receipt. Watch for the avatar video to appear in the Daily.co room.

Interrupt Confirmation

Sent after processing an interrupt request. Confirms that current video generation and playback have been stopped.
{
  "type": "agent.interrupt.confirmed",
}
Fields: None When Sent: After stopping current video generation/playback. Client Action: Optional - you can track that the interrupt was successful. Example Handlers:
if (message.type === 'agent.interrupt.confirmed') {
    console.log('Playback interrupted successfully');
    // Can now send new audio
}

Error Event

Sent when an error occurs during processing. Contains detailed error information.
{
  "type": "error",
  "error": "Detailed error message"
}
Fields:
  • error - Human-readable error description
When Sent: When any operation fails or invalid data is received. Client Action: Log the error, notify user, and take corrective action based on the error message. Example Handlers:
if (message.type === 'error') {
    console.error('Server error:', message.error);
    alert(`Error: ${message.error}`);
}

Common Error Messages

Error MessageMeaningSolution
First message must be session.initSession init was not sent firstSend session.init immediately after connecting
Invalid room platformThe room config is missing/invalidSet room.platform to daily
Invalid video dimensionsNon-positive or non-numeric dimensionsUse positive integers for video_width and video_height
Invalid aspect ratioUnsupported width/height ratioUse one of: 18:9, 16:9, 5:3, 16:10, 3:2, 4:3, 1:1 (or portrait equivalents)
Invalid session id - server not assigned or session id mismatchSession was routed to the wrong serverRe-run Start Session and use the returned ws_uri
Failed to fetch video pathSession config could not be fetchedEnsure the session is started and the access_token is correct
Session not foundSession ID not recognizedRe-establish session with session.init
Invalid JSON formatMalformed JSON receivedValidate JSON structure before sending
No audio data providedEmpty audio fieldInclude base64-encoded audio
Failed to process audio: ...Invalid audio formatVerify 16-bit mono PCM and set the correct sample_rate
Unknown message type: ...Unsupported messageCheck message type is supported

Event Handling Best Practices

1. Always Check Event Type

ws.onmessage = async (event) => {
    const message = JSON.parse(event.data);
    
    switch (message.type) {
        case 'connection.established':
            handleConnectionEstablished(message);
            break;
        case 'agent.speak.confirmed':
            handleSpeakConfirmed(message);
            break;
        case 'agent.interrupt.confirmed':
            handleInterruptConfirmed(message);
            break;
        case 'error':
            handleError(message);
            break;
        default:
            console.warn('Unknown message type:', message.type);
    }
};

2. Serialize audio sends

agent.speak.confirmed does not include a correlation ID. For the simplest integration, send one agent.speak at a time and wait for the confirmation before sending the next chunk.
function waitForSpeakConfirmed(ws, timeoutMs = 5000) {
    return new Promise((resolve, reject) => {
        const timeout = setTimeout(() => {
            ws.removeEventListener('message', onMessage);
            reject(new Error('Timeout waiting for agent.speak.confirmed'));
        }, timeoutMs);

        function onMessage(event) {
            const msg = JSON.parse(event.data);
            if (msg.type === 'agent.speak.confirmed') {
                clearTimeout(timeout);
                ws.removeEventListener('message', onMessage);
                resolve(msg.audio_samples);
            }
        }

        ws.addEventListener('message', onMessage);
    });
}

3. Handle Errors Gracefully

if (message.type === 'error') {
    console.error('Server error:', message.error);
    
    // Handle specific errors
    if (message.error.includes('Session not found')) {
        // Re-establish session
        await reconnect();
    } else {
        // Show user-friendly error
        showErrorToUser(message.error);
    }
}

4. Wait for Confirmations

Don’t assume operations succeeded. Wait for confirmation events:
async function sendAudioWithConfirmation(ws, audioBase64, sampleRate = 48000) {
    // Send audio
    ws.send(JSON.stringify({
        type: 'agent.speak',
        audio: audioBase64,
        sample_rate: sampleRate
    }));

    // Wait for the next confirmation
    const audioSamples = await waitForSpeakConfirmed(ws);
    return audioSamples;
}