Audio Pipeline
The bidirectional audio pipeline is the core engineering challenge in Lira — capturing audio from Google Meet, processing it through AI, and injecting the response back into the meeting.
The Challenge
Google Meet has no API for audio access. Lira runs a headless Chromium browser that joins the meeting as a participant. Audio must be captured from the browser's WebRTC streams and injected back as Lira's "microphone."
Audio Bridge Architecture
The audio bridge (AudioBridge class) is a two-layer system — browser-side JavaScript injected into Chromium, and a Node.js orchestration layer.
Google Meet (WebRTC)
│
▼ [Capture Path]
RTCPeerConnection.ontrack
→ AudioContext (16 kHz)
→ ScriptProcessorNode (buffer: 1024 samples)
→ Energy gate (RMS < 0.001 = silence → drop)
→ Float32 → Int16 PCM → base64
→ page.exposeFunction('__liraOnAudioCapture')
→ Node.js AudioBridge
→ Echo gate check
→ Nova Sonic input stream
Nova Sonic output
→ PCM audio (24 kHz)
→ base64 → Int16 → Float32 → AudioBuffer
→ AudioBufferSourceNode (gapless scheduling)
→ MediaStreamDestination (48 kHz, auto-resampled)
→ getUserMedia override
→ Google Meet receives as Lira's "mic"
│
▼ [Injection Path]
All participants hear Lira speak
Capture Path (Meeting → Lira)
- RTCPeerConnection interception — Lira overrides
RTCPeerConnection.prototype.ontrackto intercept audio tracks from other participants before the page loads - AudioContext processing — Raw audio is resampled to 16 kHz (required by Nova Sonic) via a
ScriptProcessorNodewith a 1024-sample buffer - Energy gate — Frames with RMS energy below 0.001 are classified as silence and dropped, reducing unnecessary processing
- Format conversion — Float32 samples → Int16 PCM → base64 encoding
- Browser-to-Node bridge — Uses
page.exposeFunction('__liraOnAudioCapture')andpage.exposeFunction('__liraLog')to send data from the browser context to Node.js
Injection Path (Lira → Meeting)
- PCM decode — Nova Sonic's output (24 kHz) is decoded from base64 to Int16 to Float32
- Chunk batching — Audio chunks are batched (default 50ms) to reduce CDP (Chrome DevTools Protocol) calls
- AudioBuffer scheduling — Chunks are scheduled for gapless playback via
AudioBufferSourceNode - MediaStreamDestination — The audio context auto-resamples from 24 kHz to 48 kHz for WebRTC compatibility
- getUserMedia override — Lira overrides
navigator.mediaDevices.getUserMediato return this custom stream as the "microphone" - WebRTC delivery — Google Meet sends this stream to all participants
Echo Gate
A critical component that prevents feedback loops — Lira hearing her own output. The echo gate operates at two levels:
Browser-side: An outputting flag suppresses audio capture while Lira is speaking.
Node.js side: The AudioBridge.endOutput() method uses a debounced 250ms check to handle multi-block responses. After the last audio block plays, the browser drains scheduled audio plus a 200ms reverb margin. The Node-side echo gate clears after an additional 400ms safety buffer.
AI starts speaking → outputting = true → capture suppressed
AI stops speaking → 250ms debounce → drain audio → 200ms reverb → 400ms safety → capture resumes
RTCPeerConnection Interception
// Lira injects this script before Google Meet loads
const originalRTCPeerConnection = window.RTCPeerConnection;
window.RTCPeerConnection = function(...args) {
const pc = new originalRTCPeerConnection(...args);
pc.addEventListener('track', (event) => {
if (event.track.kind === 'audio') {
// Capture this audio track for processing
captureAudioTrack(event.streams[0]);
}
});
return pc;
};
getUserMedia Override
// Replace the real microphone with Lira's AI audio stream
navigator.mediaDevices.getUserMedia = async (constraints) => {
if (constraints.audio) {
return liraAudioStream; // Custom MediaStream from AI output
}
return originalGetUserMedia(constraints);
};
Bridge Initialization
The AudioBridge.setup() method initializes the system before the page loads:
- Injects the initialization script before
page.goto()— this sets up the audio capture and injection pipelines - Exposes
__liraOnAudioCapture— receives base64 PCM from the browser - Exposes
__liraLog— receives debug logs from browser context - Overrides
getUserMediaandRTCPeerConnectionbefore Google Meet's code runs