Our Open Architecture

The open components SafeScribe runs on, and why each one was chosen.

SafeScribe is closed-source, but the technology underneath is largely open. Where we use third-party components, we use them because they're auditable, peer-reviewed, and battle-tested — not because they were the easiest box to check. This page lists what we run and why.

Speech recognition

Whisper large-v3

What it is. An open-source speech recognition model from OpenAI Whisper, supporting 99 languages with automatic detection. We run the large-v3 model compiled to CTranslate2 via faster-whisper for GPU efficiency.

Why this model. We benchmarked Whisper variants against the FLEURS evaluation set on representative languages and tracked both word error rate and per-stream throughput. We ship large-v3 for its accuracy: the faster large-v3-turbo variant was evaluated and rejected because it measured higher word error rates on our language mix (notably Turkish) despite its throughput advantage — for a privacy-first product that never re-runs on your data, accuracy wins. We can quote our parameters because we measured them ourselves on a modern data-center GPU, not because we copied a marketing chart.

Voice activity detection

Silero VAD

What it is. An open-source neural voice activity detector from the Silero project, designed to identify speech segments in audio.

Why we use it. Whisper has a known failure mode where it hallucinates plausible-sounding text on silent or near-silent audio. Silero VAD runs first and tells the transcription stage which segments contain speech, eliminating the overwhelming majority of those hallucinations. The parameter set we ship was selected by sweeping representative languages on the FLEURS evaluation set, not by guesswork or by adopting library defaults.

On-device audio pipeline

FFmpeg via ffmpeg_kit_flutter_new_audio

What it does. Every audio file is preprocessed on your device before upload — high-pass filtering at 80 Hz to remove rumble, loudness normalization to −16 LUFS (the level whisper-style models prefer), peak limiting, and resampling to 16 kHz mono FLAC. Silence and non-speech are handled downstream by Silero VAD rather than an amplitude gate. The server only ever sees an already-optimized, lossless stream.

Why on-device. The fewer transformations we do server-side, the smaller the surface area where things can go wrong with your data. Doing the work on your device also means a 50 MB raw video can become a 2 MB FLAC before it touches the network — better for your data plan, better for our bandwidth, equivalent quality.

Network and TLS

Cloudflare Tunnel

What it is. A reverse-proxy connector from Cloudflare that exposes our backend without opening any inbound ports on the origin server. TLS is terminated at Cloudflare's edge.

Why this approach. No inbound port means no DDoS surface and no certificate-renewal automation on the origin. Cloudflare's CT-compliant certificate rotation happens automatically. The origin server is invisible to the public internet; it only initiates outbound connections.

Authentication

OIDC (Google Sign-In, Apple Sign-In)

What it is. Standard OpenID Connect via Google and Apple. We never see your email or display name — the authentication providers do.

What we store. A SHA-256 hash of the OIDC sub claim, salted with a per-deployment secret. That's our entire user identifier. It's deterministic enough to recognize a returning user, and one-way enough that it can't be reversed to reveal who you are. No email, no name, no phone number, no IP address ever lands in storage or logs.

Storage

Redis (RAM-only) and SQLite

Redis. Configured with no snapshotting and no append-only log — there is no persistence to disk. Audio blobs and transcripts live here only as long as the request needs them, and are deleted immediately when the user acknowledges the transcript. A power loss takes everything in flight with it.

SQLite (ledger). Used only for financial bookkeeping — credit balances, IAP receipts, refund records. No audio, no transcripts, no PII. Backed up to Cloudflare R2 daily and verifiable via WAL checkpoint.

Mobile platform

Flutter, Riverpod, Hive

Why Flutter. One codebase, two stores. Same security posture on iOS and Android — no platform-specific compromises in the privacy story.

Local storage. Hive boxes encrypted with AES-256, with the encryption key stored in iOS Keychain or Android Keystore — both backed by the device's secure hardware. Transcripts on your device stay yours.

Container runtime

Non-root containers

All backend services (API, worker, ledger) run as a dedicated non-root user inside their containers — never as root. Defense in depth: even if a service is compromised, the attacker is confined to a low-privilege account that cannot touch the host or other services.

If you're a security researcher: we welcome responsible-disclosure reports at security@safescribe.dev. Our published Security Architecture and DPIA spell out the threat model in more detail.

← Back to Resources