← Back to Overview

From recording to real-time conversation.

Four stages turn raw audio into a personality-driven system designed for full-duplex interactions. Here's what happens at each step.

01
Step 01 — Capture

Record the source material.

We analyze recordings of the target speaker across varied conversational settings to capture stable patterns. Each context exposes different behavioral dimensions that together form a complete picture of how someone converses.

audio_input.wav — waveform preview
Context A
Context B
Context C
02
Step 02 — Profile

Extract the behavioral signature.

The system analyzes multiple behavioral signals — from low-level timing and prosody up through emotional dynamics and social style — and encodes them into a single, portable profile.

host_profile — signal extraction
03
Step 03 — Enhance

Inject personality into the script.

The raw script gets annotated with behavioral data from the profile — emotional tags, reaction cues, overlap markers, and linguistic patterns — all timed naturally for that specific speaker's style.

script_enhanced.txt — before → after
Input
HOST: That's a really interesting take on automation.
↓ HDP Enhancement ↓
Output
HOST: (engaged) That's a really, you know, interesting take on automation. (Listener gives a warm "mmhmm" while the host continues)
04
Step 04 — Synthesize

Synthesize full-duplex interaction.

The enhanced script drives parametric speech synthesis in real time. The system responds, backchannels, overlaps, and yields — dynamically — matching the profiled speaker's conversational style.

output_stream — real-time synthesis
🎙
🗣
A
B
A
B
A
backchannels active overlap handling emotional arc floor management

Hear it in action.

The pipeline above produces real audio. Listen to before-and-after demos — same script, same voices, different dynamics.