CASE STUDY · DEVELOPER TOOL

Keevo

Keevo is a desktop transcription and subtitle tool built for content creators and video producers. It processes video locally using an on-device speech recognition model, generates timestamped transcripts, and exports subtitle files in multiple formats — no cloud uploads, no API costs, no privacy concerns.

TauriRustReactTypeScriptOn-device AI

Role

Solo · full-stack

Timeline

2025 · in progress

Platform

Desktop · macOS · Windows

Type

Developer Tool

Read the technical breakdown

keevo.app

THE PROBLEM

Content creators spend hours manually transcribing footage or pay recurring API costs to cloud speech services.

Built for freelance video editors, podcasters and content teams who process sensitive recordings and want full ownership of their workflow without subscription costs.

Cloud transcription is expensive. API costs accumulate fast on long recordings and eat into freelance margins.
Privacy is a real concern. Uploading client footage to third-party servers is a non-starter for many video professionals.
Manual transcription is brutal. Hours of listen-and-type work that adds nothing creative to the production.

Video creators & podcasters

Freelance editors, content teams and podcasters who need fast, private transcription without recurring API costs

0 cloud

dependency

<4 min

per hour of footage

100%

local & private

THE SOLUTION

your footage never leaves your machine.

Tauri shell (Rust core) with a React renderer in the system webview. The speech model runs in a Rust worker spawned from the Tauri backend, with results streamed to the renderer via IPC as segments complete. SQLite stores project state locally.

Import

Drop in any video or audio file.

Transcribe

On-device model runs, segments stream in.

Edit

Correct words and adjust timestamps in the timeline.

Export

SRT, VTT or plain text — ready to drop into any editor.

in progress

Before

manual workflow

fragmented tools · high manual overhead

After

keevo.app

single unified product · fast & automated

KEY FEATURES

Built around how video creators & podcasters actually work.

FEATURE 01

On-device Inference

A quantized speech model runs entirely on the local machine in a Rust worker spawned by the Tauri backend — no API keys, no uploads, no recurring costs.

Native Rust inference runtime works on both macOS and Windows
Results stream to the UI as segments complete

on-device-inference

FEATURE 02

Timeline Editor

Correct model output in a synchronized transcript editor — clicking any word seeks the video, so review is fast.

Word-level timestamp display
Keyboard-first editing flow

timeline-editor

FEATURE 03

Multi-format Export

Export to SRT, WebVTT or plain text with a single click — ready to import into Premiere, Final Cut, DaVinci or any subtitle tool.

Accurate timecodes from the reconciliation pass
UTF-8 safe for multilingual content

multi-format-export

screenshot · Multi-format Export

TECHNICAL CHALLENGE

Hard problems solved.

The pipeline extracts audio via ffmpeg, chunks it into overlapping segments, runs inference in a worker pool, merges results with a timestamp reconciliation pass, then surfaces them in the editor. Subtitle export supports SRT, VTT and plain text.

What made it hard

Running a quantized on-device speech model within Tauri without blocking the UI thread.
Handling long-form audio segmentation to produce accurate timestamps across variable speaking rates.
Designing a timeline editor that lets users correct transcripts without re-running the model.
Packaging native binaries for macOS and Windows within Tauri's cross-platform build pipeline.

  architecture.ts 
const  shell  = [ "Tauri", "Rust" ]; 
const  ui  = [ "React", "TypeScript" ]; 
const  ai  = [ "Native Rust inference", "On-device speech model" ]; 
const  storage  = [ "SQLite", "ffmpeg" ]; 

THE STACK

Technologies used.

Shell

TauriRust

UI

ReactTypeScript

AI

Native Rust inferenceOn-device speech model

Storage

SQLiteffmpeg

WHAT THIS PROVES

What Keevo demonstrates.

On-device AI

Shipped a production Rust-side inference pipeline inside Tauri without blocking the UI thread.

Desktop performance

Processes an hour of footage in under 4 minutes via a worker-pool inference architecture.

Privacy by design

Zero cloud dependency — all processing stays local, making it viable for client and sensitive recordings.

Cross-platform build

Native binary packaging for both macOS and Windows via Tauri's single build pipeline.

WORK WITH ME

Want to build something like this?

Bring me your idea or half-built project. We'll scope it, design it and ship it — using the same workflow behind Keevo.

Request development help → Book a call

Next case study EduMation →