GLM Models Guide - Runcrate

Zhipu AI’s GLM family offers strong multilingual chat models with competitive reasoning. All three generations are available through the Runcrate API.

Available GLM models

Model	Context	Strengths
GLM-5.1	128K	Latest generation, strongest reasoning
GLM-5	128K	Strong general-purpose chat
GLM-4.7	128K	Cost-effective, fast inference

Basic chat completion

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.chat_completion(
    model="zai-org/GLM-5.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain the difference between TCP and UDP for a beginner."},
    ],
    max_tokens=512,
)

print(response.choices[0].message.content)

Streaming with Vercel AI SDK

// app/api/chat/route.ts
import { runcrate } from '@runcrate/ai';
import { streamText } from 'ai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: runcrate('zai-org/GLM-5.1'),
    system: 'You are a helpful assistant specializing in technical explanations.',
    messages,
  });

  return result.toDataStreamResponse();
}

Structured output

import { runcrate } from '@runcrate/ai';
import { generateText, Output } from 'ai';
import { z } from 'zod';

const AnalysisSchema = z.object({
  topic: z.string(),
  keyPoints: z.array(z.string()).describe('3–5 main arguments'),
  conclusion: z.string(),
  confidence: z.number().min(0).max(1),
});

const { output } = await generateText({
  model: runcrate('zai-org/GLM-5.1'),
  output: Output.object({ schema: AnalysisSchema }),
  prompt: 'Analyze the impact of remote work on software engineering productivity.',
});

Comparing GLM generations

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

models = ["zai-org/GLM-5.1", "zai-org/GLM-5", "zai-org/GLM-4.7"]
prompt = "Write a SQL query for the top 10 customers by order value in the last 30 days."

for model in models:
    response = client.models.chat_completion(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=512,
        temperature=0.3,
    )
    print(f"\n--- {model} ---")
    print(response.choices[0].message.content)

Choosing the right GLM model

Use case	Model	Reason
Complex reasoning	GLM-5.1	Strongest in the family
General chat	GLM-5	Good balance of quality and speed
High-volume, cost-sensitive	GLM-4.7	Fastest, lowest cost per token

Tips

GLM-5.1 is the recommended default unless you need cost savings.
Multilingual: GLM models handle Chinese and English equally well.
Temperature 0.3–0.5 works best for factual tasks; 0.7–0.9 for creative writing.

Next steps

Chat completions reference
Model catalog
AI Chatbot with Next.js — build a chat UI with GLM

​Available GLM models

​Basic chat completion

​Streaming with Vercel AI SDK

​Structured output

​Comparing GLM generations

​Choosing the right GLM model

​Tips

​Next steps

Available GLM models

Basic chat completion

Streaming with Vercel AI SDK

Structured output

Comparing GLM generations

Choosing the right GLM model

Tips

Next steps