Qwen
Qwen3-32B
qwen/qwen3-32b
Qwen3-32B is a dense 32.8 billion-parameter model, positioned as the high-accuracy single-expert counterpart to the MoE line. It uses 64 transformer layers with 64/8 GQA heads and the full 32 k context window (extendable via YaRN). Because every parameter is active, it excels at deterministic generation, agentic tool-calling, and creative writing where dense representations can outperform similarly sized MoE peers. It is drop-in compatible with Hugging Face Transformers ≥ 4.51, vLLM, SGLang, and common GGUF/MLX-LM ports.
Tools
Function Calling
Context Window
32,768
Max Output Tokens
8,192