Amazon Nova Sonic

Lira uses Amazon Nova Sonic (amazon.nova-sonic-v1:0) — a speech-to-speech model that handles STT, reasoning, and TTS in a single bidirectional stream.

What is Nova Sonic?

Unlike traditional pipelines that chain separate STT → LLM → TTS services, Nova Sonic processes everything end-to-end:

Speech-to-Text — Transcribes what participants are saying
LLM Reasoning — Decides what to respond based on context
Text-to-Speech — Generates natural voice output

All in one streaming API call, with sub-second latency.

Bidirectional Streaming Protocol

Lira uses InvokeModelWithBidirectionalStreamCommand from AWS Bedrock:

const response = await bedrockClient.send(
  new InvokeModelWithBidirectionalStreamCommand({
    modelId: 'amazon.nova-sonic-v1:0',
    body: inputStream, // Continuous PCM audio input
  })
);

for await (const event of response.body) {
  // Process output events (transcripts, AI audio, etc.)
}

Event Flow

The Nova Sonic protocol (lira-sonic.service.ts) follows a strict event sequence:

Session Initialization

sessionStart → promptStart → systemPrompt → audioInput (streaming)

sessionStart — Opens the bidirectional stream with configuration
promptStart — Begins a new conversation turn
System prompt is sent with organization context, personality, and knowledge base results
Audio input begins streaming continuously

Input Events (Lira → Nova Sonic)

Event	Content	Frequency
`sessionStart`	Configuration, voice settings	Once per session
`promptStart`	Begin conversation turn	Per turn
`audioInput`	Base64-encoded PCM (16 kHz, mono)	Continuous
`keepAlive`	Heartbeat	Every 10 seconds
`sessionEnd`	Graceful session termination	Once

Output Events (Nova Sonic → Lira)

Event	Content
`textOutput` (role=user)	Transcription of what the participant said
`textOutput` (role=assistant)	AI's response text
`audioOutput`	PCM voice data (24 kHz) to inject into meeting
`completionEnd`	AI finished speaking — ready for next input

System Prompt & Personality Engine

Each Nova Sonic session receives a system prompt that defines Lira's behavior:

You are Lira, an AI meeting participant.

Organization context:
- Company: {org.name}
- Industry: {org.industry}
- Description: {org.description}

Relevant knowledge base context:
{vectorSearchResults}

Personality: {selectedMode} — {modeDescription}

Meeting participants: {participantList}
Current speaker: {activeSpeaker}

Rules:
- Respond in 1-3 sentences unless asked for detail
- Use natural conversational speech
- Respect turn-taking

Keepalive Mechanism

Nova Sonic sessions timeout after ~30 seconds of no input. The lira-sonic.service.ts sends a keepalive event on a 10-second interval to maintain the connection:

this.keepAliveTimer = setInterval(() => {
  this.sendEvent({ type: 'keepAlive' });
}, 10_000);

Bot Manager Integration

The bot-manager.service.ts orchestrates the full bot lifecycle:

Pre-warm — Nova Sonic session and Chromium browser launch in parallel (~16s total)
Navigate — Browser goes to the Google Meet URL
Join — Bot enters the meeting (may wait in lobby)
Active — Audio bridge connects Nova Sonic ↔ Google Meet
Polling — Periodic timers check bot health and meeting state
Post-meeting — Summary generation, task extraction, transcript save
Stale bot reaper — Background process terminates bots that lose connectivity

Barge-In Detection

If someone starts speaking while Lira is responding, Nova Sonic detects the barge-in and Lira:

Immediately stops her audio output
Clears the output buffer
Listens to what the person is saying
Responds to the new input

Output Gating & Wake Word

Nova Sonic processes all audio input and generates responses continuously. However, Lira only sends the audio output to the meeting when the Wake Word system detects that Lira was addressed.

This means Nova Sonic does useful work even in "silent" mode — it maintains context and has ready responses.

What is Nova Sonic?​

Bidirectional Streaming Protocol​

Event Flow​

Session Initialization​

Input Events (Lira → Nova Sonic)​

Output Events (Nova Sonic → Lira)​

System Prompt & Personality Engine​

Keepalive Mechanism​

Bot Manager Integration​

Barge-In Detection​

Output Gating & Wake Word​