SPS logo

SPS Dev Tool Guides

Shared AI infrastructure for the Sand Point Studios dev team

Whisper — speech to text

What it is. OpenAI's Whisper-large-v3 running locally via faster-whisper (CTranslate2 backend) on the laptop's RTX 5080. Transcribes audio/video to text. Real-time-or-better — a 60-minute lecture transcribes in ~5-10 minutes on Blackwell. Free, private, no cloud upload.

Set up in April from t041. This is the only tool in this suite that's been operational longest.

Where it lives

Location Setup Use case
laptop C:\Users\twist\<path to t041 setup — see project_laptop_5080_utilization.md / t041 conversation log> All transcription happens here

Check the t041 setup log at ~/dev-context/conversation-logs/2026-04-23_t041-whisper-pipeline-setup.md for the exact venv path and CUDA DLL handling on Windows.

When to use Whisper vs. OpenAI Whisper API / others

Use local Whisper when: - Course content with student names / FERPA-adjacent (don't upload to OpenAI) - Anything you don't want sitting on a 3rd party's drive - Bulk transcription (no per-minute API cost) - Have time to wait (a 90-min lecture = ~10 min)

Use OpenAI Whisper API when: - One-off, low-sensitivity, want it back NOW - Diarization is the load-bearing piece (their API has it; local needs WhisperX upgrade)

3 recipes

1. Transcribe a single lecture file

From the laptop, in the t041 venv:

from faster_whisper import WhisperModel
model = WhisperModel("large-v3", device="cuda", compute_type="float16")

segments, info = model.transcribe(
    "BUS-491-lesson-12-Professional-Toolkit.mp4",
    beam_size=5,
    language="en",
)
print(f"Detected language: {info.language} ({info.language_probability:.2f})")
for seg in segments:
    print(f"[{seg.start:7.2f} → {seg.end:7.2f}] {seg.text}")

Pipe to a text file:

with open("transcript.txt", "w") as f:
    for seg in segments:
        f.write(f"[{seg.start:.2f}-{seg.end:.2f}] {seg.text}\n")

2. Generate SRT captions for a demo video

segments, _ = model.transcribe("demo.mp4", language="en")

with open("demo.srt", "w") as f:
    for i, seg in enumerate(segments, 1):
        start = format_timestamp(seg.start)  # implement HH:MM:SS,mmm
        end = format_timestamp(seg.end)
        f.write(f"{i}\n{start} --> {end}\n{seg.text.strip()}\n\n")

def format_timestamp(s):
    h, s = divmod(s, 3600)
    m, s = divmod(s, 60)
    return f"{int(h):02d}:{int(m):02d}:{s:06.3f}".replace('.', ',')

Then ffmpeg -i demo.mp4 -vf subtitles=demo.srt demo-captioned.mp4 to burn them in, or upload the .srt separately.

3. Folder-watcher: drop in MP4s, get transcripts back

import pathlib, time
from faster_whisper import WhisperModel

INBOX = pathlib.Path("C:/Users/twist/Recordings/inbox")
OUTBOX = pathlib.Path("C:/Users/twist/Recordings/transcripts")
OUTBOX.mkdir(exist_ok=True)
model = WhisperModel("large-v3", device="cuda", compute_type="float16")

while True:
    for f in INBOX.glob("*.mp4"):
        out = OUTBOX / f.with_suffix(".txt").name
        if out.exists():
            continue  # already done
        print(f"Transcribing {f.name}...")
        segments, _ = model.transcribe(str(f), language="en")
        out.write_text("\n".join(s.text for s in segments))
        print(f"  → {out.name}")
    time.sleep(30)

Run this in a terminal you keep open. Drop lecture recordings into inbox/, transcripts appear in transcripts/.

Gotchas