Skip to main content
Use the Runcrate Models API as the inference backend for your SaaS product. One API key, one bill, 140+ models — no GPU management, no model hosting, no vendor lock-in.

What you’ll build

A production AI backend that handles:
  • Chat completions for customer-facing AI assistants
  • Structured output for data extraction and classification
  • Image generation for content creation features
  • Per-request billing that maps to your own pricing

Why open-source models for SaaS

OpenAI / AnthropicRuncrate (open-source models)
Pricing$3–15 per 1M output tokens$0.20–2.00 per 1M output tokens
Vendor lock-inLocked to one providerSwitch models freely
Data privacyData sent to third partyOpen-source models, your choice
Rate limitsStrict per-org limits100 req/min default, higher on request
Model choice3–5 models140+ models across 8 categories

Next.js API routes (Vercel AI SDK)

Chat endpoint for your product

// app/api/chat/route.ts
import { runcrate } from '@runcrate/ai';
import { streamText } from 'ai';

export async function POST(req: Request) {
  const { messages, userId } = await req.json();

  const result = streamText({
    model: runcrate('deepseek-ai/DeepSeek-V3'),
    system: `You are the AI assistant for Acme Corp. Help users with their questions 
      about our product. Be concise, helpful, and professional.`,
    messages,
  });

  return result.toDataStreamResponse();
}

Structured data extraction endpoint

Turn unstructured user input into structured data your app can store:
// app/api/extract/route.ts
import { runcrate } from '@runcrate/ai';
import { generateText, Output } from 'ai';
import { z } from 'zod';

const ContactSchema = z.object({
  name: z.string(),
  email: z.string().email().optional(),
  company: z.string().optional(),
  intent: z.enum(['purchase', 'support', 'partnership', 'other']),
  summary: z.string(),
});

export async function POST(req: Request) {
  const { text } = await req.json();

  const { output } = await generateText({
    model: runcrate('deepseek-ai/DeepSeek-V3'),
    output: Output.object({ schema: ContactSchema }),
    prompt: `Extract contact information and intent from this message:\n\n${text}`,
  });

  return Response.json(output);
}

Image generation endpoint

Let users generate images from your app:
// app/api/generate-image/route.ts
import { runcrate } from '@runcrate/ai';
import { generateImage } from 'ai';

export async function POST(req: Request) {
  const { prompt, style } = await req.json();

  const { image } = await generateImage({
    model: runcrate.imageModel('black-forest-labs/FLUX.1-schnell'),
    prompt: `${prompt}, ${style || 'photorealistic'}`,
    size: '1024x1024',
  });

  return Response.json({ image: image.base64 });
}

Python backend (FastAPI)

from fastapi import FastAPI
from pydantic import BaseModel
from runcrate import Runcrate

app = FastAPI()
client = Runcrate(api_key="rc_live_...")

class ChatRequest(BaseModel):
    messages: list[dict]
    model: str = "deepseek-ai/DeepSeek-V3"

@app.post("/api/chat")
async def chat(req: ChatRequest):
    response = client.models.chat_completion(
        model=req.model,
        messages=req.messages,
        max_tokens=1024,
    )
    return {"content": response.choices[0].message.content}

class ImageRequest(BaseModel):
    prompt: str
    aspect_ratio: str = "1:1"

@app.post("/api/generate-image")
async def generate_image(req: ImageRequest):
    image = client.models.generate_image(
        model="black-forest-labs/FLUX.1-schnell",
        prompt=req.prompt,
        aspect_ratio=req.aspect_ratio,
    )
    return {"url": image.data[0].url}

Content moderation middleware

Add a moderation layer before displaying AI-generated content:
import { runcrate } from '@runcrate/ai';
import { generateText, Output } from 'ai';
import { z } from 'zod';

const ModerationResult = z.object({
  safe: z.boolean(),
  categories: z.array(z.enum(['spam', 'harassment', 'nsfw', 'violence', 'pii', 'none'])),
  action: z.enum(['allow', 'flag', 'block']),
});

export async function moderateContent(content: string) {
  const { output } = await generateText({
    model: runcrate('deepseek-ai/DeepSeek-V3'),
    output: Output.object({ schema: ModerationResult }),
    prompt: `Evaluate this user-generated content for safety. Content: "${content}"`,
  });
  return output;
}

Cost estimation

At DeepSeek-V3 rates, a typical SaaS workload:
Use caseTokens/requestCost/1K requests
Short chat responses (200 tokens out)~300 total~$0.06
Data extraction (100 tokens out)~200 total~$0.04
Long-form content (1000 tokens out)~1200 total~$0.24
Image generation1 image~$0.03/image
A SaaS serving 100K chat requests/month costs roughly **6/monthininferencenot6/month** in inference — not 600.