”Compare A100, H100, and RTX 4090 pricing. I need at least 40GB VRAM.”
The agent queries available GPUs:list_gpu_types— fetches all available GPU types with pricing and specs
GPUs with 40GB+ VRAM:For your use case:
GPU VRAM Price/hr Best for RTX A6000 48 GB $0.60/hr Development, small model fine-tuning A100 40 GB 40 GB $1.20/hr Training, mid-size inference A100 80 GB 80 GB $1.60/hr Large model training, 70B inference H100 80 GB 80 GB $2.50/hr Maximum throughput, 70B+ models
- If you’re fine-tuning a 7B model, the A6000 ($0.60/hr) has enough VRAM and saves 75% vs H100.
- If you’re serving a 70B model, you need the A100 80GB minimum. The H100 gives ~40% more throughput for 56% more cost.
- The RTX 4090 (24 GB) doesn’t meet your 40GB requirement.
”What’s the cheapest GPU that can run Llama 3.1 70B?”
The agent calculates VRAM requirements and matches:list_gpu_types— gets all GPU options
Llama 3.1 70B needs ~70 GB VRAM at BF16, or ~35 GB at 8-bit quantization. Cheapest single-GPU options:Multi-GPU option (lower per-GPU cost):
Option GPU VRAM Quantization Price/hr Budget A100 80 GB 80 GB BF16 (full precision) $1.60/hr Premium H100 80 GB 80 GB BF16 (full precision) $2.50/hr Recommendation: Single A100 80GB at 2.40/hr) and adds tensor-parallel overhead.
Option GPUs VRAM total Price/hr Split 2x A100 40 GB 80 GB $2.40/hr
”Show me everything available under $1/hr for quick experiments.”
The agent filters by price:list_gpu_types— gets all GPU types, filters by hourly rate
GPUs under $1/hr:Best value for quick experiments: RTX 4090 at $0.35/hr. It has the best price-to-performance ratio for single-GPU workloads — fast enough for real training, cheap enough to leave running during iteration. A 4-hour experiment on RTX 4090 costs $1.40 total.
GPU VRAM Price/hr Good for RTX 4090 24 GB $0.35/hr Dev work, 7-8B models, fine-tuning RTX A6000 48 GB $0.60/hr Larger dev work, 13B models RTX 3090 24 GB $0.25/hr Budget experiments, prototyping
Tools used in this workflow
| Tool | Purpose |
|---|---|
list_gpu_types | Fetch all available GPUs with pricing, VRAM, and region info |