GPU requirements
| Model | GPU | VRAM needed | Approx. cost |
|---|---|---|---|
| Qwen2.5-7B-Instruct | RTX 4090 (24 GB) | ~14 GB | ~$0.35/hr |
| Qwen3-32B | A100 80 GB | ~64 GB | ~$1.60/hr |
| Qwen2.5-72B-Instruct | A100 80 GB | ~72 GB | ~$1.60/hr |
| Qwen3-235B-A22B (MoE) | 2x H100 80 GB | ~60 GB each | ~$5.00/hr |
Deploy Qwen2.5 7B (RTX 4090)
Deploy Qwen3 32B (A100)
Deploy Qwen3 235B MoE (2x H100)
Test the endpoint
Enable thinking mode
Add/think to the system prompt for step-by-step reasoning, or /no_think for direct answers: