Skip to main content

Overview

Make your avatars talk by sending audio through a WebSocket connection. We strongly recommend using our LiveKit or Pipecat integrations to make implementation easier and more reliable. These plugins handle all the complex WebSocket and audio streaming logic for you. For advanced use cases requiring full control, you can also build a direct integration with our WebSocket API.

Quick Start: How It Works

Three Simple Steps:
  1. Create/Start a session → Get session_id and session_token
  2. Connect to WebSocketwss://wss.agenthuman.com with session.init
  3. Send audio → Avatar generates video and streams to your room
The avatar automatically joins the video room you specify during session creation. You receive the video through your Daily or LiveKit room.

Step-by-Step Implementation

Step 1: Create and Start a Session

First, create a session with your video room configuration:
const response = await fetch('https://api.agenthuman.com/v1/sessions', {
  method: 'POST',
  headers: {
    'x-api-key': 'ah_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxx',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    avatar_id: 'avat_01H3Z8G9YR3K2N5M6P7Q8W4T',
    aspect_ratio: '16:9',
    room: {
      platform: 'daily',  // or 'livekit'
      url: 'https://your-domain.daily.co/your-room',
      token: 'your-daily-token',
      display_name: 'AI Avatar (AH)'
    }
  })
});

const { session } = await response.json();
const { session_id, session_token } = session;

// Session is automatically started with 'started' status
console.log('Session ready:', session_id);
Room Configuration: You must provide your own Daily or LiveKit room. The avatar will automatically join this room when you initialize the WebSocket connection.
→ Full Create Session API Reference

Step 2: Connect to WebSocket

Connect to wss://wss.agenthuman.com and initialize with your session credentials:
const ws = new WebSocket('wss://wss.agenthuman.com');

ws.onopen = () => {
  console.log('WebSocket connected');
  
  // Initialize session (MUST be first message)
  ws.send(JSON.stringify({
    type: 'session.init',
    config: {
      session_id: session_id,
      session_token: session_token,
      room: {
        platform: 'daily',  // or 'livekit'
        url: 'https://your-domain.daily.co/your-room',
        token: 'your-daily-token',
        display_name: 'AI Avatar (AH)'
      },
      video_width: 1280,
      video_height: 720
    }
  }));
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  if (message.type === 'connection.established') {
    console.log('Session initialized! Ready to send audio');
    // Avatar will join your room automatically
  }
};
Important: The session.init message must be the first message sent after connecting. The WebSocket will reject any other message sent first.

Step 3: Send Audio to Make Avatar Talk

Send audio as base64-encoded, 16-bit mono PCM:
// Convert audio to required format
const audioContext = new AudioContext({ sampleRate: 48000 });
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const channelData = audioBuffer.getChannelData(0);  // Mono
const samples = new Int16Array(channelData.length);

// Convert float32 to int16 PCM
for (let i = 0; i < channelData.length; i++) {
  const s = Math.max(-1, Math.min(1, channelData[i]));
  samples[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
}

// Encode to base64
const base64Audio = btoa(
  String.fromCharCode(...new Uint8Array(samples.buffer))
);

// Send to avatar
ws.send(JSON.stringify({
  type: 'agent.speak',
  audio: base64Audio,
  sample_rate: 48000  // Must match your audio sample rate
}));

// Avatar will generate video and stream to your room
Audio Requirements:
ParameterValueDescription
FormatRaw PCMRaw audio bytes (NOT WAV file)
Bit Depth16-bit signedAudio sample format
ChannelsMonoSingle channel only
Sample RateConfigurableCommon: 16000, 48000 Hz
EncodingBase64String encoding for JSON
Max Size10 MBPer WebSocket message
Sample Rate: You can send audio at any sample rate (commonly 16kHz or 48kHz). Just make sure to set the sample_rate field to match your audio. The avatar server will handle any necessary resampling.

Step 4: Receive Confirmations

The server sends confirmation events:
ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  switch (message.type) {
    case 'connection.established':
      console.log('Session ready:', message.session_id);
      break;
      
    case 'agent.speak.confirmed':
      console.log(`Audio received: ${message.audio_samples} samples`);
      // Video is being generated and streamed to your room
      break;
      
    case 'agent.interrupt.confirmed':
      console.log('Playback interrupted');
      break;
      
    case 'error':
      console.error('Error:', message.error);
      break;
  }
};
→ Full Server Message Reference

Step 5: View the Avatar Video

The avatar video is streamed to your Daily or LiveKit room. Join the room to see the avatar:
import DailyIframe from '@daily-co/daily-js';

// Create video frame
const callFrame = DailyIframe.createFrame({
  iframeStyle: {
    width: '100%',
    height: '100%'
  }
});

// Join the room
await callFrame.join({
  url: 'https://your-domain.daily.co/your-room',
  token: 'your-daily-token'
});

// Avatar will appear as a participant
callFrame.on('participant-joined', (event) => {
  console.log('Avatar joined:', event.participant.user_name);
});

Complete Example: Browser Client

Here’s a complete working example:
<!DOCTYPE html>
<html>
<head>
  <title>Avatar Talking Demo</title>
  <script src="https://unpkg.com/@daily-co/daily-js"></script>
</head>
<body>
  <h1>AI Avatar</h1>
  <div id="video-container" style="width: 640px; height: 480px;"></div>
  <button onclick="makeAvatarTalk()">Make Avatar Talk</button>
  
  <script>
    let ws = null;
    let callFrame = null;
    
    // 1. Initialize (after creating session via API)
    const SESSION_ID = 'your-session-id';
    const SESSION_TOKEN = 'your-session-token';
    const ROOM_URL = 'https://your-domain.daily.co/your-room';
    const ROOM_TOKEN = 'your-room-token';
    
    async function init() {
      // Join video room (Daily example)
      callFrame = DailyIframe.createFrame(
        document.getElementById('video-container')
      );
      
      await callFrame.join({
        url: ROOM_URL,
        token: ROOM_TOKEN
      });
      
      // Connect WebSocket
      ws = new WebSocket('wss://wss.agenthuman.com');
      
      ws.onopen = () => {
        ws.send(JSON.stringify({
          type: 'session.init',
          config: {
            session_id: SESSION_ID,
            session_token: SESSION_TOKEN,
            room: {
              platform: 'daily',
              url: ROOM_URL,
              token: ROOM_TOKEN,
              display_name: 'AI Avatar (AH)'
            }
          }
        }));
      };
      
      ws.onmessage = (event) => {
        const msg = JSON.parse(event.data);
        console.log('Received:', msg.type);
      };
    }
    
    async function makeAvatarTalk() {
      // Load audio file
      const response = await fetch('speech.wav');
      const arrayBuffer = await response.arrayBuffer();
      
      // Convert to 48kHz mono PCM
      const audioContext = new AudioContext({ sampleRate: 48000 });
      const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
      const channelData = audioBuffer.getChannelData(0);
      const samples = new Int16Array(channelData.length);
      
      for (let i = 0; i < channelData.length; i++) {
        const s = Math.max(-1, Math.min(1, channelData[i]));
        samples[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
      }
      
      const base64Audio = btoa(
        String.fromCharCode(...new Uint8Array(samples.buffer))
      );
      
      // Send to avatar
      ws.send(JSON.stringify({
        type: 'agent.speak',
        audio: base64Audio,
        sample_rate: 48000
      }));
      
      console.log('Audio sent! Watch the video...');
    }
    
    // Initialize on load
    init();
  </script>
</body>
</html>
→ More Complete Examples

WebSocket API Reference

Client → Server Messages

Message TypeDescription
session.initInitialize session (must be first message)
agent.speakSend audio to make avatar talk
agent.interruptStop current playback
→ Full Client Messages Reference

Server → Client Messages

Event TypeDescription
connection.establishedSession initialized successfully
agent.speak.confirmedAudio received and queued
agent.interrupt.confirmedPlayback interrupted
errorError occurred
→ Full Server Messages Reference

Best Practices

Audio Quality

  • Remove background noise before sending
  • Normalize audio levels to prevent clipping
  • Use high-quality recordings for best results
  • Send 5-10 second audio chunks for smooth playback
  • Don’t send too many small chunks (< 1 second)
  • Don’t send extremely long audio (> 30 seconds) in one message
  • Always verify 16-bit mono PCM format
  • Set correct sample_rate (e.g., 48000)
  • Test with sample audio first

Connection Management

Implement exponential backoff for reconnections:
async function connectWithRetry(maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      await connect();
      return;
    } catch (error) {
      const delay = Math.pow(2, i) * 1000;
      await new Promise(r => setTimeout(r, delay));
    }
  }
  throw new Error('Failed to connect after retries');
}
WebSocket connections stay alive automatically. If you need to detect disconnections:
ws.onclose = () => {
  console.log('Disconnected, reconnecting...');
  reconnect();
};

Session Lifecycle

  1. Create session via REST API
  2. Join your video room
  3. Connect WebSocket
  4. Send session.init
  5. Wait for connection.established
  6. Start sending audio
When finished:
  1. Stop sending audio
  2. Close WebSocket connection
  3. Leave video room
  4. End session via REST API
// Clean shutdown
ws.close();
await callFrame.leave();
await fetch(`/v1/sessions/${session_id}/end`, {
  method: 'POST',
  headers: { 'x-api-key': API_KEY }
});

Troubleshooting

Common Issues

Cause: You sent a message before initializing the session.Solution: Always send session.init as the first message after connecting.
ws.onopen = () => {
  // This MUST be first
  ws.send(JSON.stringify({
    type: 'session.init',
    config: { /* ... */ }
  }));
};
Causes:
  • Wrong audio format (not 16-bit mono PCM)
  • Incorrect sample_rate specified
  • Invalid base64 encoding
Solutions:
  1. Verify audio format: 16-bit, mono, PCM
  2. Set correct sample_rate (e.g., 48000 for 48kHz audio)
  3. Test base64 encoding/decoding
  4. Check for audio data corruption
Causes:
  • Room credentials are incorrect
  • Avatar hasn’t finished joining yet
  • Network issues
Solutions:
  1. Verify room URL and token are correct
  2. Wait 2-3 seconds after connection.established
  3. Check room participant events
  4. Ensure room has available capacity
Causes:
  • Audio format incorrect
  • Haven’t received agent.speak.confirmed
  • Video generation in progress
Solutions:
  1. Check for error events from server
  2. Wait for agent.speak.confirmed
  3. Allow 1-2 seconds for video generation
  4. Check video room connection
→ Full Best Practices Guide

Next Steps

Support

Need help? Our team is here to assist:

Email Support

[email protected]Response within 24 hours

API Status

status.agenthuman.comReal-time API monitoring