Workshop

Building Voice AI: From Whisper to Production Assistants

The next AI interface isn't a chat box. It's a voice.

About this workshop

Design and deploy voice-enabled AI applications end to end. Covers Whisper for transcription, TTS pipelines, real-time streaming with WebSockets, wake-word detection, and building a production voice assistant from scratch.

What you will learn

  • Implement real-time speech-to-text with Whisper and handle streaming transcription
  • Build a text-to-speech pipeline with natural prosody using ElevenLabs and OpenAI TTS
  • Stream audio bidirectionally over WebSockets with sub-300ms latency
  • Deploy a production voice assistant with wake-word detection and fallback handling

Who this is for

  • Developers who want to add voice interaction to their AI applications
  • Engineers building conversational AI, contact centre automation, or voice assistants
  • Anyone who has experimented with Whisper and wants to go end-to-end in production

By the end

Before

Text-only AI applications with no voice layer

After

Real-time voice assistants with sub-300ms latency end to end

Before

Whisper experiments that never reach production

After

A fully deployed, monitored voice pipeline from scratch

Before

Guessing at streaming architecture for bidirectional audio

After

WebSocket pipelines you fully understand, own, and can extend

About Dev

Dev Patel

Dev Patel

Voice AI Engineer & Startup Founder

Vetted by Maram

Dev founded a voice AI startup that was acquired in 2024, having built real-time voice assistants used in over 200 contact centres globally. He is a frequent speaker at Voice Summit and writes the technical blog Voice Eng, read by 40,000 engineers monthly.

View full profile →

What learners say

Reviews appear here once 3 learners have completed this session.