Skip to content

ChatGLM 系列

简介

ChatGLM 是智谱 AI 推出的双语(中英)对话模型,基于 GLM(General Language Model)架构。GLM-4 系列支持 128K 上下文、工具调用、代码执行,是国内开源可部署的主流选择。

安装

bash
pip install zhipuai
# 本地部署
pip install transformers torch accelerate

API 调用(BigModel 平台)

python
from zhipuai import ZhipuAI

client = ZhipuAI(api_key="your-api-key")

response = client.chat.completions.create(
    model="glm-4-flash",  # 免费模型
    messages=[
        {"role": "user", "content": "解释什么是智能体(AI Agent)"}
    ]
)
print(response.choices[0].message.content)

模型选型

模型上下文特点费用
glm-4-flash128K速度极快免费
glm-4-air128K均衡性能低价
glm-4128K旗舰模型标准
glm-4-long1M超长上下文标准
glm-4v8K视觉理解标准

本地部署(开源版本)

python
from transformers import AutoTokenizer, AutoModel
import torch

# 加载 ChatGLM3-6B(需要约 13GB 显存)
model_path = "THUDM/chatglm3-6b"

tokenizer = AutoTokenizer.from_pretrained(
    model_path, trust_remote_code=True
)
model = AutoModel.from_pretrained(
    model_path,
    trust_remote_code=True,
    device_map="auto",          # 自动分配 GPU
    torch_dtype=torch.float16   # 半精度节省显存
).eval()

# 对话
response, history = model.chat(
    tokenizer,
    "你好,介绍一下自己",
    history=[]
)
print(response)

# 多轮对话
response, history = model.chat(
    tokenizer,
    "你能做什么?",
    history=history
)
print(response)

流式输出

python
# API 流式
stream = client.chat.completions.create(
    model="glm-4-flash",
    messages=[{"role": "user", "content": "写一份贷款风险评估报告框架"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

工具调用

python
tools = [
    {
        "type": "function",
        "function": {
            "name": "query_credit_score",
            "description": "查询客户信用评分",
            "parameters": {
                "type": "object",
                "properties": {
                    "customer_id": {"type": "string", "description": "客户ID"},
                    "query_type": {
                        "type": "string",
                        "enum": ["basic", "detailed"],
                        "description": "查询类型"
                    }
                },
                "required": ["customer_id"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="glm-4",
    messages=[{"role": "user", "content": "查询客户 C001 的信用评分"}],
    tools=tools
)

tool_call = response.choices[0].message.tool_calls[0]
print(f"工具: {tool_call.function.name}")
print(f"参数: {tool_call.function.arguments}")

量化部署(低显存)

python
# INT4 量化,仅需 6GB 显存
model = AutoModel.from_pretrained(
    "THUDM/chatglm3-6b",
    trust_remote_code=True
).quantize(4).cuda().eval()

# INT8 量化,需要 10GB 显存
model = AutoModel.from_pretrained(
    "THUDM/chatglm3-6b",
    trust_remote_code=True
).quantize(8).cuda().eval()

选型建议

  • 快速原型:glm-4-flash(免费,速度快)
  • 生产 API:glm-4-air(性价比高)
  • 私有化部署:ChatGLM3-6B 量化版(6GB 显存可运行)
  • 高精度任务:glm-4(旗舰版)

本站内容由 褚成志 整理编写,仅供学习参考