1. Prepare your input file
Create a JSONL file with one prompt per line:2. Deploy and upload
3. Install vLLM and upload the batch script
4. Download results and tear down
Monitor progress
Cost estimate
| Prompts | Model | GPU | Time | Approx. cost |
|---|---|---|---|---|
| 10,000 | Llama 8B | RTX 4090 | ~15 min | ~$0.09 |
| 10,000 | Llama 70B | A100 80 GB | ~30 min | ~$0.80 |
| 100,000 | Llama 70B | H100 80 GB | ~2 hrs | ~$5.00 |
Tips
- vLLM offline batch mode uses continuous batching — much faster than sequential API calls.
- For 100K+ prompts, split the file and process in chunks to avoid OOM.
- Check your balance before starting:
runcrate billing balance.