Qwen3-235B-A22B | ModelBox

Qwen3-235B-A22B is the flagship Mixture-of-Experts (MoE) model in the new Qwen 3 family. With 235 billion total parameters—but only 22 billion active per token—it delivers near-frontier accuracy while remaining deployable on multi-GPU clusters. The model uses 128 experts (8 are routed per token) across 94 transformer layers and employs Gated Q-Attention (64 Q-heads, 4 KV-heads) for efficient scaling. It natively handles 32 768-token contexts and has been validated up to 131 072 tokens using YaRN positional scaling. Like all Qwen 3 models, it ships under Apache-2.0, supports explicit thinking / no-thinking modes, and shows state-of-the-art reasoning, code generation, and multilingual performance among open-source LLMs.

Provider	Input Token Price	Output Token Price
cerebras	$0.60/Million Tokens	$5/Million Tokens
DashScope	$0.60/Million Tokens	$5/Million Tokens