1. Deploy an instance
2. Install faster-whisper
3. Upload audio files
4. Transcribe a single file
5. Batch transcribe a directory
6. Download results
Model sizes
| Model | VRAM | Speed (1hr audio) |
|---|---|---|
| tiny / base | ~1 GB | ~10-15 sec |
| small | ~2 GB | ~25 sec |
| medium | ~5 GB | ~50 sec |
| large-v3 | ~10 GB | ~2 min |
Tips
- Use
large-v3for production accuracy. Usesmallfor fast iteration. - faster-whisper supports
word_timestamps=Truefor word-level alignment. - The model downloads on first use (~3 GB for large-v3). Attach a volume to cache it.