Skip to main content

AgentHumanVideoService

from agenthuman.video import AgentHumanVideoService

AgentHumanVideoService(
    api_key="ah_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxx",
    session_request=NewSessionRequest(avatar="avat_01KMDFWB9SW1QX4TVP0RT1RFYQ"),
    transport=transport,
)

Constructor Parameters

ParameterTypeDescription
api_keystrRequired. Your AgentHuman API key
session_requestNewSessionRequestAvatar and aspect ratio configuration. Defaults to a built-in avatar with aspect_ratio="auto"
transportBaseTransportRequired. Your Pipecat output transport. Must have video_out_enabled=True

NewSessionRequest

from agenthuman.api import NewSessionRequest

NewSessionRequest(
    avatar="avat_01KMDFWB9SW1QX4TVP0RT1RFYQ",
    aspect_ratio="auto",  # optional
)

Parameters

ParameterTypeDescription
avatarstrRequired. Avatar ID from your AgentHuman dashboard
aspect_ratiostrVideo aspect ratio: "4:3", "3:4", "1:1", or "auto". Defaults to "auto"

Aspect Ratios

ValueTypical resolutionBest for
"4:3"1200×900Landscape / desktop
"3:4"900×1200Portrait / mobile
"1:1"1200×1200Square layouts
"auto"Derived from transportAutomatic — recommended
When aspect_ratio is "auto" (the default), the service reads video_out_width and video_out_height from your transport and selects the closest supported ratio. A warning is logged if the dimensions don’t closely match any standard ratio.

Transport Requirements

AgentHumanVideoService requires a transport with video output enabled. Passing a transport without video_out_enabled=True raises a ValueError immediately.
# Daily.co
from pipecat.transports.daily.transport import DailyParams

DailyParams(
    audio_in_enabled=True,
    audio_out_enabled=True,
    video_out_enabled=True,    # required
    video_out_is_live=True,
    video_out_width=1200,
    video_out_height=1200,
    video_out_bitrate=2_000_000,
)
# Generic WebRTC
from pipecat.transports.base_transport import TransportParams

TransportParams(
    audio_in_enabled=True,
    audio_out_enabled=True,
    video_out_enabled=True,    # required
    video_out_is_live=True,
    video_out_width=1200,
    video_out_height=1200,
)

Session Lifecycle

PhaseWhat happens
SetupPOST /sessions API call — AgentHuman avatar session created, internal LiveKit credentials returned
StartLiveKit room connection established; audio chunk size calculated from transport dimensions
RunningTTS frames resampled to 16 kHz mono → sent to avatar via DataStreamAudioOutput; avatar video/audio frames forwarded downstream
Stop / CancelLiveKit room disconnected; POST /sessions/{id}/end called to terminate the session

Audio Processing

The service automatically resamples all incoming TTS audio to 16 kHz mono PCM before forwarding to the avatar. No manual audio format configuration is needed regardless of which TTS service you use. Audio is buffered and sent in chunks. A new chunk is dispatched when:
  • The buffer reaches the target chunk size and silence is detected, or
  • A TTSStoppedFrame is received (end of utterance)
A 2-second trailing silence is appended to each final chunk to ensure the avatar finishes animating cleanly.