StreamVox - AI Real-Time Translator
StreamVox is the ultimate AI-powered real-time translator for Windows. Whether you're in a business meeting, watching a foreign movie, or chatting with friends, StreamVox breaks language barriers instantly with human-level accuracy.
Powered by advanced AI models, it delivers lightning-fast subtitles (>500ms latency) for any audio playing on your PC or captured from your microphone. Includes features like Universal Overlay that stays on top of any app, and Mobile Call Translation via Phone Link.
Challenge
Building a low-latency, real-time audio processing pipeline on Windows that can handle system audio capture (loopback) and microphone input simultaneously, while rendering a transparent overlay that doesn't interfere with other applications.
Solution
Developed a Python-based application using PyQt6 for the GUI and overlay. Integrated Deepgram's streaming API for ultra-fast speech-to-text. Implemented efficient audio loopback capture using WASAPI. Designed a non-intrusive 'click-through' overlay system. Added support for Microsoft Phone Link to bridge mobile calls to the desktop translation engine.
Implementation Details
Real-Time Audio Pipeline
Architected a high-performance audio capture system using Windows WASAPI loopback. Implemented ring buffers to handle audio streams without distinct latency. Integrated Deepgram's WebSocket API for streaming transcription with keep-alive mechanisms to handle silence periods effectively.
Universal Overlay UI
Built a custom transparent window overlay using PyQt6. Implemented 'Window Stays On Top' and 'Transparent for Input' flags to ensure the subtitles float over games and movies without blocking interaction. Designed a dynamic text rendering engine that adjusts opacity and size for readability.
Mobile Integration
Leveraged Microsoft Phone Link protocols to capture audio from connected mobile devices. This allows users to see subtitles for phone calls (WhatsApp, Telegram, Cellular) directly on their Windows desktop screen.
Key Results & Impact
Published on Microsoft Store
Achieved <500ms translation latency
Seamless integration with Zoom, Teams, and Netflix