Skip to main content
Zhipu AI’s GLM family offers strong multilingual chat models with competitive reasoning. All three generations are available through the Runcrate API.

Available GLM models

ModelContextStrengths
GLM-5.1128KLatest generation, strongest reasoning
GLM-5128KStrong general-purpose chat
GLM-4.7128KCost-effective, fast inference

Basic chat completion

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

response = client.models.chat_completion(
    model="zai-org/GLM-5.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain the difference between TCP and UDP for a beginner."},
    ],
    max_tokens=512,
)

print(response.choices[0].message.content)

Streaming with Vercel AI SDK

// app/api/chat/route.ts
import { runcrate } from '@runcrate/ai';
import { streamText } from 'ai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: runcrate('zai-org/GLM-5.1'),
    system: 'You are a helpful assistant specializing in technical explanations.',
    messages,
  });

  return result.toDataStreamResponse();
}

Structured output

import { runcrate } from '@runcrate/ai';
import { generateText, Output } from 'ai';
import { z } from 'zod';

const AnalysisSchema = z.object({
  topic: z.string(),
  keyPoints: z.array(z.string()).describe('3–5 main arguments'),
  conclusion: z.string(),
  confidence: z.number().min(0).max(1),
});

const { output } = await generateText({
  model: runcrate('zai-org/GLM-5.1'),
  output: Output.object({ schema: AnalysisSchema }),
  prompt: 'Analyze the impact of remote work on software engineering productivity.',
});

Comparing GLM generations

from runcrate import Runcrate

client = Runcrate(api_key="rc_live_YOUR_API_KEY")

models = ["zai-org/GLM-5.1", "zai-org/GLM-5", "zai-org/GLM-4.7"]
prompt = "Write a SQL query for the top 10 customers by order value in the last 30 days."

for model in models:
    response = client.models.chat_completion(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=512,
        temperature=0.3,
    )
    print(f"\n--- {model} ---")
    print(response.choices[0].message.content)

Choosing the right GLM model

Use caseModelReason
Complex reasoningGLM-5.1Strongest in the family
General chatGLM-5Good balance of quality and speed
High-volume, cost-sensitiveGLM-4.7Fastest, lowest cost per token

Tips

  • GLM-5.1 is the recommended default unless you need cost savings.
  • Multilingual: GLM models handle Chinese and English equally well.
  • Temperature 0.3–0.5 works best for factual tasks; 0.7–0.9 for creative writing.

Next steps