Back to Portfolio
AI Desktop Application
AI Desktop Application
Ongoing
Solo Developer

StreamVox - AI Real-Time Translator

StreamVox is the ultimate AI-powered real-time translator for Windows. Whether you're in a business meeting, watching a foreign movie, or chatting with friends, StreamVox breaks language barriers instantly with human-level accuracy.

Powered by advanced AI models, it delivers lightning-fast subtitles (>500ms latency) for any audio playing on your PC or captured from your microphone. Includes features like Universal Overlay that stays on top of any app, and Mobile Call Translation via Phone Link.

Key Result
500ms latency, 99% uptime, 10+ languages supported

Challenge

Building a low-latency, real-time audio processing pipeline on Windows that can handle system audio capture (loopback) and microphone input simultaneously, while rendering a transparent overlay that doesn't interfere with other applications.

Solution

Developed a Python-based application using PyQt6 for the GUI and overlay. Integrated Deepgram's streaming API for ultra-fast speech-to-text. Implemented efficient audio loopback capture using WASAPI. Designed a non-intrusive 'click-through' overlay system. Added support for Microsoft Phone Link to bridge mobile calls to the desktop translation engine.

Implementation Details

01

Real-Time Audio Pipeline

Architected a high-performance audio capture system using Windows WASAPI loopback. Implemented ring buffers to handle audio streams without distinct latency. Integrated Deepgram's WebSocket API for streaming transcription with keep-alive mechanisms to handle silence periods effectively.

02

Universal Overlay UI

Built a custom transparent window overlay using PyQt6. Implemented 'Window Stays On Top' and 'Transparent for Input' flags to ensure the subtitles float over games and movies without blocking interaction. Designed a dynamic text rendering engine that adjusts opacity and size for readability.

03

Mobile Integration

Leveraged Microsoft Phone Link protocols to capture audio from connected mobile devices. This allows users to see subtitles for phone calls (WhatsApp, Telegram, Cellular) directly on their Windows desktop screen.

Key Results & Impact

Published on Microsoft Store

Achieved <500ms translation latency

Seamless integration with Zoom, Teams, and Netflix

Key Features

Universal Transparent Overlay
Real-time System Audio Translation
Microphone Input Translation
Mobile Call Translation (Phone Link)
Sub-500ms Latency
Support for 10+ Languages
Privacy-First (No Audio Storage)

Frontend

PyQt6Qt QuickQMLWindows SDK

Backend & Infra

PythonDeepgram APIOpenAI APIWebSocketsWASAPI

Ready to Build Something Similar?

Discuss Your Project