Qwen3-235B-A22B

Qwen

qwen/qwen3-235b-a22b

Qwen3-235B-A22B is the flagship Mixture-of-Experts (MoE) model in the new Qwen 3 family. With 235 billion total parameters—but only 22 billion active per token—it delivers near-frontier accuracy while remaining deployable on multi-GPU clusters. The model uses 128 experts (8 are routed per token) across 94 transformer layers and employs Gated Q-Attention (64 Q-heads, 4 KV-heads) for efficient scaling. It natively handles 32 768-token contexts and has been validated up to 131 072 tokens using YaRN positional scaling. Like all Qwen 3 models, it ships under Apache-2.0, supports explicit thinking / no-thinking modes, and shows state-of-the-art reasoning, code generation, and multilingual performance among open-source LLMs.

Tools

Function Calling

Context Window

32,768

Max Output Tokens

8,192

Language

Python JavaScript Curl

Using Qwen3-235B-A22B with Python API

Using Qwen3-235B-A22B with OpenAI compatible API

import openai

client = openai.Client(
  api_key= '{your_api_key}',
  base_url="https://api.model.box/v1",
)
response = client.chat.completions.create(
model="qwen/qwen3-235b-a22b",
messages: [
  {
    role: 'user',
    content:
      'introduce your self',
    },
  ]
)
print(response)