Workshop
Building Voice AI: From Whisper to Production Assistants
The next AI interface isn't a chat box. It's a voice.
About this workshop
Design and deploy voice-enabled AI applications end to end. Covers Whisper for transcription, TTS pipelines, real-time streaming with WebSockets, wake-word detection, and building a production voice assistant from scratch.
What you will learn
- Implement real-time speech-to-text with Whisper and handle streaming transcription
- Build a text-to-speech pipeline with natural prosody using ElevenLabs and OpenAI TTS
- Stream audio bidirectionally over WebSockets with sub-300ms latency
- Deploy a production voice assistant with wake-word detection and fallback handling
Who this is for
- Developers who want to add voice interaction to their AI applications
- Engineers building conversational AI, contact centre automation, or voice assistants
- Anyone who has experimented with Whisper and wants to go end-to-end in production
By the end
Before
Text-only AI applications with no voice layer
After
Real-time voice assistants with sub-300ms latency end to end
Before
Whisper experiments that never reach production
After
A fully deployed, monitored voice pipeline from scratch
Before
Guessing at streaming architecture for bidirectional audio
After
WebSocket pipelines you fully understand, own, and can extend
About Dev
Dev Patel
Voice AI Engineer & Startup Founder
Vetted by Maram
Dev founded a voice AI startup that was acquired in 2024, having built real-time voice assistants used in over 200 contact centres globally. He is a frequent speaker at Voice Summit and writes the technical blog Voice Eng, read by 40,000 engineers monthly.
View full profile →What learners say
Reviews appear here once 3 learners have completed this session.
