Skip to main content
GPU compute is billed by the hour. Every idle instance is wasted money.

Monitor your spend

runcrate billing balance     # Current credit balance
runcrate billing usage       # Per-instance spending breakdown
runcrate ps                  # List running instances (all billing right now)

Pick the right GPU

WorkloadRecommended GPUHourly cost
Inference (7B-8B models)RTX 4090~$0.35/hr
Inference (70B models)A100 80 GB~$1.60/hr
Fine-tuning (7B QLoRA)RTX 4090~$0.35/hr
Fine-tuning (70B QLoRA)A100 80 GB~$1.60/hr
Training (custom models)H100~$2.50/hr
runcrate instances types     # Browse GPUs and pricing

Delete instances when done

runcrate instances delete <name>
runcrate ps                  # Verify nothing is left running

Use volumes to avoid re-setup costs

Re-downloading models wastes 10-30 minutes of GPU time per session:
runcrate volumes create --name workspace --size 100
runcrate instances create --name dev --gpu RTX4090 --template ubuntu-devbox --storage workspace
Models and packages at /workspace/ persist across deploys.

Right-size your instance

runcrate ssh <instance> -- nvidia-smi
nvidia-smi readingAction
GPU-Util: 90%+, Memory: 80%+Correctly sized
GPU-Util: 90%+, Memory: 40%Consider a GPU with less VRAM
GPU-Util: 20%, Memory: 20%Overpaying — use a smaller GPU

Batch your work

Deploy, process, tear down — pay only for the minutes your job runs:
runcrate instances create --name batch --gpu A100 --template ubuntu-inference
runcrate cp ./inputs/ batch:/workspace/inputs/
runcrate ssh batch -- "cd /workspace && python process.py"
runcrate cp batch:/workspace/outputs/ ./outputs/
runcrate instances delete batch

Use the Models API for light workloads

For inference under ~1,000 requests/day, the Models API is cheaper than a dedicated GPU:
curl https://api.runcrate.ai/v1/chat/completions \
  -H "Authorization: Bearer rc_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello."}],
    "max_tokens": 128
  }'

Quick checklist

  • Run runcrate ps daily — kill anything not in use.
  • Run runcrate billing usage weekly — spot unexpected charges early.
  • Use volumes for models and data — avoid re-downloads.
  • Match GPU to workload — check nvidia-smi utilization.
  • Delete instances immediately after batch jobs complete.