Making Avatars Talk

Overview

Make your avatars talk by sending audio through a WebSocket connection. We strongly recommend using our LiveKit or Pipecat integrations to make implementation easier and more reliable. These plugins handle all the complex WebSocket and audio streaming logic for you. For advanced use cases requiring full control, you can also build a direct integration with our WebSocket API.

Quick Start: How It Works

Three Simple Steps:

Create/Start a session → Get session_id and session_token
Connect to WebSocket → wss://wss.agenthuman.com with session.init
Send audio → Avatar generates video and streams to your room

The avatar automatically joins the video room you specify during session creation. You receive the video through your Daily or LiveKit room.

Step-by-Step Implementation

Step 1: Create and Start a Session

First, create a session with your video room configuration:

const response = await fetch('https://api.agenthuman.com/v1/sessions', {
  method: 'POST',
  headers: {
    'x-api-key': 'ah_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxx',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    avatar_id: 'avat_01H3Z8G9YR3K2N5M6P7Q8W4T',
    aspect_ratio: '16:9',
    room: {
      platform: 'daily',  // or 'livekit'
      url: 'https://your-domain.daily.co/your-room',
      token: 'your-daily-token',
      display_name: 'AI Avatar (AH)'
    }
  })
});

const { session } = await response.json();
const { session_id, session_token } = session;

// Session is automatically started with 'started' status
console.log('Session ready:', session_id);

Room Configuration: You must provide your own Daily or LiveKit room. The avatar will automatically join this room when you initialize the WebSocket connection.

→ Full Create Session API Reference

Step 2: Connect to WebSocket

Connect to wss://wss.agenthuman.com and initialize with your session credentials:

const ws = new WebSocket('wss://wss.agenthuman.com');

ws.onopen = () => {
  console.log('WebSocket connected');
  
  // Initialize session (MUST be first message)
  ws.send(JSON.stringify({
    type: 'session.init',
    config: {
      session_id: session_id,
      session_token: session_token,
      room: {
        platform: 'daily',  // or 'livekit'
        url: 'https://your-domain.daily.co/your-room',
        token: 'your-daily-token',
        display_name: 'AI Avatar (AH)'
      },
      video_width: 1280,
      video_height: 720
    }
  }));
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  if (message.type === 'connection.established') {
    console.log('Session initialized! Ready to send audio');
    // Avatar will join your room automatically
  }
};

Important: The session.init message must be the first message sent after connecting. The WebSocket will reject any other message sent first.

Step 3: Send Audio to Make Avatar Talk

Send audio as base64-encoded, 16-bit mono PCM:

// Convert audio to required format
const audioContext = new AudioContext({ sampleRate: 48000 });
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const channelData = audioBuffer.getChannelData(0);  // Mono
const samples = new Int16Array(channelData.length);

// Convert float32 to int16 PCM
for (let i = 0; i < channelData.length; i++) {
  const s = Math.max(-1, Math.min(1, channelData[i]));
  samples[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
}

// Encode to base64
const base64Audio = btoa(
  String.fromCharCode(...new Uint8Array(samples.buffer))
);

// Send to avatar
ws.send(JSON.stringify({
  type: 'agent.speak',
  audio: base64Audio,
  sample_rate: 48000  // Must match your audio sample rate
}));

// Avatar will generate video and stream to your room

Audio Requirements:

Parameter	Value	Description
Format	Raw PCM	Raw audio bytes (NOT WAV file)
Bit Depth	16-bit signed	Audio sample format
Channels	Mono	Single channel only
Sample Rate	Configurable	Common: 16000, 48000 Hz
Encoding	Base64	String encoding for JSON
Max Size	10 MB	Per WebSocket message

Sample Rate: You can send audio at any sample rate (commonly 16kHz or 48kHz). Just make sure to set the sample_rate field to match your audio. The avatar server will handle any necessary resampling.

Step 4: Receive Confirmations

The server sends confirmation events:

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  switch (message.type) {
    case 'connection.established':
      console.log('Session ready:', message.session_id);
      break;
      
    case 'agent.speak.confirmed':
      console.log(`Audio received: ${message.audio_samples} samples`);
      // Video is being generated and streamed to your room
      break;
      
    case 'agent.interrupt.confirmed':
      console.log('Playback interrupted');
      break;
      
    case 'error':
      console.error('Error:', message.error);
      break;
  }
};

→ Full Server Message Reference

Step 5: View the Avatar Video

The avatar video is streamed to your Daily or LiveKit room. Join the room to see the avatar:

import DailyIframe from '@daily-co/daily-js';

// Create video frame
const callFrame = DailyIframe.createFrame({
  iframeStyle: {
    width: '100%',
    height: '100%'
  }
});

// Join the room
await callFrame.join({
  url: 'https://your-domain.daily.co/your-room',
  token: 'your-daily-token'
});

// Avatar will appear as a participant
callFrame.on('participant-joined', (event) => {
  console.log('Avatar joined:', event.participant.user_name);
});

Complete Example: Browser Client

Here’s a complete working example:

<!DOCTYPE html>
<html>
<head>
  <title>Avatar Talking Demo</title>
  <script src="https://unpkg.com/@daily-co/daily-js"></script>
</head>
<body>
  <h1>AI Avatar</h1>
  <div id="video-container" style="width: 640px; height: 480px;"></div>
  <button onclick="makeAvatarTalk()">Make Avatar Talk</button>
  
  <script>
    let ws = null;
    let callFrame = null;
    
    // 1. Initialize (after creating session via API)
    const SESSION_ID = 'your-session-id';
    const SESSION_TOKEN = 'your-session-token';
    const ROOM_URL = 'https://your-domain.daily.co/your-room';
    const ROOM_TOKEN = 'your-room-token';
    
    async function init() {
      // Join video room (Daily example)
      callFrame = DailyIframe.createFrame(
        document.getElementById('video-container')
      );
      
      await callFrame.join({
        url: ROOM_URL,
        token: ROOM_TOKEN
      });
      
      // Connect WebSocket
      ws = new WebSocket('wss://wss.agenthuman.com');
      
      ws.onopen = () => {
        ws.send(JSON.stringify({
          type: 'session.init',
          config: {
            session_id: SESSION_ID,
            session_token: SESSION_TOKEN,
            room: {
              platform: 'daily',
              url: ROOM_URL,
              token: ROOM_TOKEN,
              display_name: 'AI Avatar (AH)'
            }
          }
        }));
      };
      
      ws.onmessage = (event) => {
        const msg = JSON.parse(event.data);
        console.log('Received:', msg.type);
      };
    }
    
    async function makeAvatarTalk() {
      // Load audio file
      const response = await fetch('speech.wav');
      const arrayBuffer = await response.arrayBuffer();
      
      // Convert to 48kHz mono PCM
      const audioContext = new AudioContext({ sampleRate: 48000 });
      const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
      const channelData = audioBuffer.getChannelData(0);
      const samples = new Int16Array(channelData.length);
      
      for (let i = 0; i < channelData.length; i++) {
        const s = Math.max(-1, Math.min(1, channelData[i]));
        samples[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
      }
      
      const base64Audio = btoa(
        String.fromCharCode(...new Uint8Array(samples.buffer))
      );
      
      // Send to avatar
      ws.send(JSON.stringify({
        type: 'agent.speak',
        audio: base64Audio,
        sample_rate: 48000
      }));
      
      console.log('Audio sent! Watch the video...');
    }
    
    // Initialize on load
    init();
  </script>
</body>
</html>

→ More Complete Examples

WebSocket API Reference

Client → Server Messages

Message Type	Description
`session.init`	Initialize session (must be first message)
`agent.speak`	Send audio to make avatar talk
`agent.interrupt`	Stop current playback

→ Full Client Messages Reference

Server → Client Messages

Event Type	Description
`connection.established`	Session initialized successfully
`agent.speak.confirmed`	Audio received and queued
`agent.interrupt.confirmed`	Playback interrupted
`error`	Error occurred

→ Full Server Messages Reference

Best Practices

Audio Quality

Use Clean Audio

Remove background noise before sending
Normalize audio levels to prevent clipping
Use high-quality recordings for best results

Optimal Chunk Sizes

Send 5-10 second audio chunks for smooth playback
Don’t send too many small chunks (< 1 second)
Don’t send extremely long audio (> 30 seconds) in one message

Format Validation

Always verify 16-bit mono PCM format
Set correct sample_rate (e.g., 48000)
Test with sample audio first

Connection Management

Reconnection Strategy

Implement exponential backoff for reconnections:

async function connectWithRetry(maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      await connect();
      return;
    } catch (error) {
      const delay = Math.pow(2, i) * 1000;
      await new Promise(r => setTimeout(r, delay));
    }
  }
  throw new Error('Failed to connect after retries');
}

Keep-Alive

WebSocket connections stay alive automatically. If you need to detect disconnections:

ws.onclose = () => {
  console.log('Disconnected, reconnecting...');
  reconnect();
};

Session Lifecycle

Session Startup

Create session via REST API
Join your video room
Connect WebSocket
Send session.init
Wait for connection.established
Start sending audio

Session Cleanup

When finished:

Stop sending audio
Close WebSocket connection
Leave video room
End session via REST API

// Clean shutdown
ws.close();
await callFrame.leave();
await fetch(`/v1/sessions/${session_id}/end`, {
  method: 'POST',
  headers: { 'x-api-key': API_KEY }
});

Troubleshooting

Common Issues

Error: First message must be session.init

Cause: You sent a message before initializing the session.Solution: Always send session.init as the first message after connecting.

ws.onopen = () => {
  // This MUST be first
  ws.send(JSON.stringify({
    type: 'session.init',
    config: { /* ... */ }
  }));
};

Error: Failed to process audio

Causes:

Wrong audio format (not 16-bit mono PCM)
Incorrect sample_rate specified
Invalid base64 encoding

Solutions:

Verify audio format: 16-bit, mono, PCM
Set correct sample_rate (e.g., 48000 for 48kHz audio)
Test base64 encoding/decoding
Check for audio data corruption

Avatar not appearing in room

Causes:

Room credentials are incorrect
Avatar hasn’t finished joining yet
Network issues

Solutions:

Verify room URL and token are correct
Wait 2-3 seconds after connection.established
Check room participant events
Ensure room has available capacity

No video after sending audio

Causes:

Audio format incorrect
Haven’t received agent.speak.confirmed
Video generation in progress

Solutions:

Check for error events from server
Wait for agent.speak.confirmed
Allow 1-2 seconds for video generation
Check video room connection

→ Full Best Practices Guide

Next Steps

Pipecat Integration

Plug-and-play integration for Pipecat voice AI pipelines

LiveKit Integration

Native plugin for LiveKit real-time communication

Complete Examples

Full working code for browser, Python, and Node.js

Client Messages

Detailed WebSocket message format reference

Server Messages

Server event types and handling

Best Practices

Optimization tips and production guidelines

Support

Need help? Our team is here to assist:

Email Support

[email protected]Response within 24 hours

API Status

status.agenthuman.comReal-time API monitoring

Getting Started

Sessions

Websocket

Usage & Billing

Schemas

Overview

Quick Start: How It Works

Step-by-Step Implementation

Step 1: Create and Start a Session

Step 2: Connect to WebSocket

Step 3: Send Audio to Make Avatar Talk

Step 4: Receive Confirmations

Step 5: View the Avatar Video

Complete Example: Browser Client

WebSocket API Reference

Client → Server Messages

Server → Client Messages

Best Practices

Audio Quality

Connection Management

Session Lifecycle

Troubleshooting

Common Issues

Next Steps

Pipecat Integration

LiveKit Integration

Complete Examples

Client Messages

Server Messages

Best Practices

Support

Email Support

API Status

Getting Started

Sessions

Websocket

Usage & Billing

Schemas

​Overview

​Quick Start: How It Works

​Step-by-Step Implementation

​Step 1: Create and Start a Session

​Step 2: Connect to WebSocket

​Step 3: Send Audio to Make Avatar Talk

​Step 4: Receive Confirmations

​Step 5: View the Avatar Video

​Complete Example: Browser Client

​WebSocket API Reference

​Client → Server Messages

​Server → Client Messages

​Best Practices

​Audio Quality

​Connection Management

​Session Lifecycle

​Troubleshooting

​Common Issues

​Next Steps

Pipecat Integration

LiveKit Integration

Complete Examples

Client Messages

Server Messages

Best Practices

​Support

Email Support

API Status

Overview

Quick Start: How It Works

Step-by-Step Implementation

Step 1: Create and Start a Session

Step 2: Connect to WebSocket

Step 3: Send Audio to Make Avatar Talk

Step 4: Receive Confirmations

Step 5: View the Avatar Video

Complete Example: Browser Client

WebSocket API Reference

Client → Server Messages

Server → Client Messages

Best Practices

Audio Quality

Connection Management

Session Lifecycle

Troubleshooting

Common Issues

Next Steps

Support