Qwen3-32B

Qwen

qwen/qwen3-32b

Qwen3-32B is a dense 32.8 billion-parameter model, positioned as the high-accuracy single-expert counterpart to the MoE line. It uses 64 transformer layers with 64/8 GQA heads and the full 32 k context window (extendable via YaRN). Because every parameter is active, it excels at deterministic generation, agentic tool-calling, and creative writing where dense representations can outperform similarly sized MoE peers. It is drop-in compatible with Hugging Face Transformers ≥ 4.51, vLLM, SGLang, and common GGUF/MLX-LM ports.

Tools

Function Calling

Context Window

32,768

Max Output Tokens

8,192

Language

Python JavaScript Curl

Using Qwen3-32B with Python API

Using Qwen3-32B with OpenAI compatible API

import openai

client = openai.Client(
  api_key= '{your_api_key}',
  base_url="https://api.model.box/v1",
)
response = client.chat.completions.create(
model="qwen/qwen3-32b",
messages: [
  {
    role: 'user',
    content:
      'introduce your self',
    },
  ]
)
print(response)