Our Open Architecture
The open components SafeScribe runs on, and why each one was chosen.
SafeScribe is closed-source, but the technology underneath is largely open. Where we use third-party components, we use them because they're auditable, peer-reviewed, and battle-tested — not because they were the easiest box to check. This page lists what we run and why.
Speech recognitionWhisper large-v3-turbo
What it is. An open-source speech recognition model from OpenAI Whisper, supporting 99 languages with automatic detection. We run the large-v3-turbo variant compiled to CTranslate2 via faster-whisper for GPU efficiency.
Why this model. We benchmarked Whisper variants against the FLEURS evaluation set on representative languages and tracked both word error rate and per-stream throughput. The configuration we ship gave us roughly three times the throughput of plain large-v3 at slightly lower error rates — the combination of production-grade accuracy and a reasonable per-minute cost we needed to make a privacy-first PAYG model viable. We can quote our parameters because we measured them ourselves on a modern data-center GPU, not because we copied a marketing chart.
Silero VAD
What it is. An open-source neural voice activity detector from the Silero project, designed to identify speech segments in audio.
Why we use it. Whisper has a known failure mode where it hallucinates plausible-sounding text on silent or near-silent audio. Silero VAD runs first and tells the transcription stage which segments contain speech, eliminating the overwhelming majority of those hallucinations. The parameter set we ship was selected by sweeping representative languages on the FLEURS evaluation set, not by guesswork or by adopting library defaults.
On-device audio pipelineFFmpeg via ffmpeg_kit_flutter_new_audio
What it does. Every audio file is preprocessed on your device before upload — high-pass filtering at 80 Hz to remove rumble, leading-silence trimming, two-pass loudness normalization to −16 LUFS (the level whisper-style models prefer), peak limiting, and resampling to 16 kHz mono FLAC. The server only ever sees an already-optimized, lossless stream.
Why on-device. The fewer transformations we do server-side, the smaller the surface area where things can go wrong with your data. Doing the work on your device also means a 50 MB raw video can become a 2 MB FLAC before it touches the network — better for your data plan, better for our bandwidth, equivalent quality.
Network and TLSCloudflare Tunnel
What it is. A reverse-proxy connector from Cloudflare that exposes our backend without opening any inbound ports on the origin server. TLS is terminated at Cloudflare's edge.
Why this approach. No inbound port means no DDoS surface and no certificate-renewal automation on the origin. Cloudflare's CT-compliant certificate rotation happens automatically. The origin server is invisible to the public internet; it only initiates outbound connections.
AuthenticationOIDC (Google Sign-In, Apple Sign-In)
What it is. Standard OpenID Connect via Google and Apple. We never see your email or display name — the authentication providers do.
What we store. A SHA-256 hash of the OIDC sub claim, salted with a per-deployment secret. That's our entire user identifier. It's deterministic enough to recognize a returning user, and one-way enough that it can't be reversed to reveal who you are. No email, no name, no phone number, no IP address ever lands in storage or logs.
Redis (RAM-only) and SQLite
Redis. Configured with no snapshotting and no append-only log — there is no persistence to disk. Audio blobs and transcripts live here only as long as the request needs them, and are deleted immediately when the user acknowledges the transcript. A power loss takes everything in flight with it.
SQLite (ledger). Used only for financial bookkeeping — credit balances, IAP receipts, refund records. No audio, no transcripts, no PII. Backed up to Cloudflare R2 daily and verifiable via WAL checkpoint.
Mobile platformFlutter, Riverpod, Hive
Why Flutter. One codebase, two stores. Same security posture on iOS and Android — no platform-specific compromises in the privacy story.
Local storage. Hive boxes encrypted with AES-256, with the encryption key stored in iOS Keychain or Android Keystore — both backed by the device's secure hardware. Transcripts on your device stay yours.
Container runtimeNon-root containers
All backend services (API, worker, ledger) run as a dedicated non-root user inside their containers — never as root. Defense in depth: even if a service is compromised, the attacker is confined to a low-privilege account that cannot touch the host or other services.