Qwen
Qwen3-32B
qwen/qwen3-32b
Qwen3-32B is a dense 32.8 billion-parameter model, positioned as the high-accuracy single-expert counterpart to the MoE line. It uses 64 transformer layers with 64/8 GQA heads and the full 32 k context window (extendable via YaRN). Because every parameter is active, it excels at deterministic generation, agentic tool-calling, and creative writing where dense representations can outperform similarly sized MoE peers. It is drop-in compatible with Hugging Face Transformers ≥ 4.51, vLLM, SGLang, and common GGUF/MLX-LM ports.
Tools
Function Calling
Context Window
32,768
Max Output Tokens
8,192
Using Qwen3-32B with Python API
Using Qwen3-32B with OpenAI compatible API
import openai
client = openai.Client(
api_key= '{your_api_key}',
base_url="https://api.model.box/v1",
)
response = client.chat.completions.create(
model="qwen/qwen3-32b",
messages: [
{
role: 'user',
content:
'introduce your self',
},
]
)
print(response)