← Lab

Voice Agent Weekend Build

A weekend experiment turning Claude into a real-time voice assistant using browser APIs.

in-progress

Weekend Voice Agent

The Question

Can you build a useful voice agent in a single weekend using only Claude’s API and native browser speech APIs — no third-party voice platforms?

The Build

The idea was simple: use the Web Speech API for recognition, pipe the transcript to Claude, and use Speech Synthesis for the response. The entire loop should feel conversational, not transactional.

The first version worked in about two hours. The hard part was latency — the gap between finishing your sentence and hearing a response. Streaming Claude’s response and starting speech synthesis on the first sentence (while the rest generates) cut perceived latency from ~3 seconds to under 1.

The Outcome

The prototype works surprisingly well for simple queries and conversations. It falls apart on anything requiring context longer than a few turns — the full conversation history eats the context window quickly.

Next step: experiment with summarising old turns before they’re sent, to keep the context window lean without losing thread.

Say hello.