WhisperUI

Local audio transcription. No API key. No cloud. No data leaving your machine.

The Problem

You have a sensitive meeting recording. A client interview. A personal voice memo.

You need it transcribed.

The options:

  • Cloud services: Require API keys, cost money per minute, send your audio to someone else’s servers
  • Manual transcription: Takes hours, error-prone, tedious
  • Do nothing: The recording sits there, unsearchable, unusable

For sensitive content, cloud services are unacceptable. Your data should stay on your machine.

The Solution

WhisperUI is a local audio transcription tool that:

  • Runs OpenAI’s Whisper model entirely on your hardware
  • Requires no API key
  • Never sends your audio to the cloud
  • Works offline
  • Exports to text, SRT, or VTT formats

The Pipeline

WhisperUI uses a four-stage pipeline:

Stage 1: Audio Processing

  • FFmpeg handles audio format conversion
  • Supports MP3, WAV, M4A, and more
  • Normalizes audio quality for better transcription

Stage 2: Transcription

  • OpenAI Whisper model runs locally
  • Multiple model sizes (tiny, base, small, medium, large)
  • Trade-off between speed and accuracy

Stage 3: Timestamping

  • Automatic timestamp generation
  • Word-level or segment-level precision
  • Configurable timestamp intervals

Stage 4: Export

  • Plain text (.txt)
  • Subtitles (.srt, .vtt)
  • JSON with word-level timestamps

The Privacy Guarantee

Your audio never leaves your machine.

  • No API key required
  • No cloud communication
  • No data sent to external servers
  • Works entirely offline

This is for anyone who’s ever hesitated to paste a sensitive meeting transcript into an online service.

Platforms

Windows:

  • Standalone executable (.exe)
  • No installation required
  • Runs on any Windows machine

macOS:

  • Standalone application (.app)
  • Drag-and-drop installation
  • Native macOS experience

Use Cases

  • Business meetings: Transcribe strategy discussions without sending to cloud
  • Client interviews: Keep sensitive client data on your machine
  • Personal recordings: Voice memos, lectures, podcasts
  • Legal/medical: Transcribe recordings where privacy is required

Technical Details

Model: OpenAI Whisper — state-of-the-art speech recognition model trained on 680,000 hours of multilingual data.

Framework: PyQt6 for the desktop application interface. Cross-platform, native look and feel.

Audio Processing: FFmpeg for robust audio format handling and conversion.

Consulting Angle

WhisperUI demonstrates the value of local-first AI tooling. For organizations with privacy requirements, compliance obligations, or sensitive data, cloud-based AI services are often non-starters. Local models like Whisper enable AI capabilities without the privacy trade-off.


GitHub: WhisperUI


Built with Python, PyQt6, and OpenAI Whisper. Local audio transcription. No cloud. No API key.