Projects / Keevo
CASE STUDY · DEVELOPER TOOL

Keevo

Keevo is a desktop transcription and subtitle tool built for content creators and video producers. It processes video locally using an on-device speech recognition model, generates timestamped transcripts, and exports subtitle files in multiple formats — no cloud uploads, no API costs, no privacy concerns.

TauriRustReactTypeScriptOn-device AI
Role
Solo · full-stack
Timeline
2025 · in progress
Platform
Desktop · macOS · Windows
Type
Developer Tool
keevo.app
Keevo screenshot
THE PROBLEM

Content creators spend hours manually transcribing footage or pay recurring API costs to cloud speech services.

Built for freelance video editors, podcasters and content teams who process sensitive recordings and want full ownership of their workflow without subscription costs.

  • Cloud transcription is expensive. API costs accumulate fast on long recordings and eat into freelance margins.
  • Privacy is a real concern. Uploading client footage to third-party servers is a non-starter for many video professionals.
  • Manual transcription is brutal. Hours of listen-and-type work that adds nothing creative to the production.

Video creators & podcasters

Freelance editors, content teams and podcasters who need fast, private transcription without recurring API costs

0 cloud
dependency
<4 min
per hour of footage
100%
local & private
THE SOLUTION

your footage never leaves your machine.

Tauri shell (Rust core) with a React renderer in the system webview. The speech model runs in a Rust worker spawned from the Tauri backend, with results streamed to the renderer via IPC as segments complete. SQLite stores project state locally.

01

Import

Drop in any video or audio file.

02

Transcribe

On-device model runs, segments stream in.

03

Edit

Correct words and adjust timestamps in the timeline.

04

Export

SRT, VTT or plain text — ready to drop into any editor.

in progress
Before
manual workflow
fragmented tools · high manual overhead
After
keevo.app
single unified product · fast & automated
KEY FEATURES

Built around how video creators & podcasters actually work.

FEATURE 01

On-device Inference

A quantized speech model runs entirely on the local machine in a Rust worker spawned by the Tauri backend — no API keys, no uploads, no recurring costs.

  • Native Rust inference runtime works on both macOS and Windows
  • Results stream to the UI as segments complete
on-device-inference
On-device Inference
FEATURE 02

Timeline Editor

Correct model output in a synchronized transcript editor — clicking any word seeks the video, so review is fast.

  • Word-level timestamp display
  • Keyboard-first editing flow
timeline-editor
Timeline Editor
FEATURE 03

Multi-format Export

Export to SRT, WebVTT or plain text with a single click — ready to import into Premiere, Final Cut, DaVinci or any subtitle tool.

  • Accurate timecodes from the reconciliation pass
  • UTF-8 safe for multilingual content
multi-format-export
screenshot · Multi-format Export
TECHNICAL CHALLENGE

Hard problems solved.

The pipeline extracts audio via ffmpeg, chunks it into overlapping segments, runs inference in a worker pool, merges results with a timestamp reconciliation pass, then surfaces them in the editor. Subtitle export supports SRT, VTT and plain text.

What made it hard

  • Running a quantized on-device speech model within Tauri without blocking the UI thread.
  • Handling long-form audio segmentation to produce accurate timestamps across variable speaking rates.
  • Designing a timeline editor that lets users correct transcripts without re-running the model.
  • Packaging native binaries for macOS and Windows within Tauri's cross-platform build pipeline.
architecture.ts
1 const shell = [ "Tauri", "Rust" ];
2 const ui = [ "React", "TypeScript" ];
3 const ai = [ "Native Rust inference", "On-device speech model" ];
4 const storage = [ "SQLite", "ffmpeg" ];
THE STACK

Technologies used.

Shell
TauriRust
UI
ReactTypeScript
AI
Native Rust inferenceOn-device speech model
Storage
SQLiteffmpeg
WHAT THIS PROVES

What Keevo demonstrates.

On-device AI

Shipped a production Rust-side inference pipeline inside Tauri without blocking the UI thread.

Desktop performance

Processes an hour of footage in under 4 minutes via a worker-pool inference architecture.

Privacy by design

Zero cloud dependency — all processing stays local, making it viable for client and sensitive recordings.

Cross-platform build

Native binary packaging for both macOS and Windows via Tauri's single build pipeline.

WORK WITH ME

Want to build something like this?

Bring me your idea or half-built project. We'll scope it, design it and ship it — using the same workflow behind Keevo.

Next case study EduMation