Tongyi
Qwen

Qwen3-235B-A22B

qwen/qwen3-235b-a22b

Qwen3-235B-A22B is the flagship Mixture-of-Experts (MoE) model in the new Qwen 3 family. With 235 billion total parameters—but only 22 billion active per token—it delivers near-frontier accuracy while remaining deployable on multi-GPU clusters. The model uses 128 experts (8 are routed per token) across 94 transformer layers and employs Gated Q-Attention (64 Q-heads, 4 KV-heads) for efficient scaling. It natively handles 32 768-token contexts and has been validated up to 131 072 tokens using YaRN positional scaling. Like all Qwen 3 models, it ships under Apache-2.0, supports explicit thinking / no-thinking modes, and shows state-of-the-art reasoning, code generation, and multilingual performance among open-source LLMs.

Tools

Function Calling

Context Window

32,768

Max Output Tokens

8,192

ProviderInput Token PriceOutput Token Price
DashScope$0.60/Million Tokens$5/Million Tokens