Qwen
Qwen3-30B-A3B
qwen/qwen3-30b-a3b
Qwen3-30B-A3B is the “Pro” MoE tier designed for balanced cost-to-quality. It contains 30.5 billion total parameters, but activates only ~3.3 billion at inference—yielding GPT-3.5-class quality at a fraction of the memory footprint. The architecture mirrors the 235 B model (128 experts, 8 routed) but runs 48 transformer layers with 32/4 GQA heads. It retains the 32 k native context window, YaRN compatibility, and the same controllable thinking switch that lets developers trade raw reasoning traces for latency.
Tools
Function Calling
Context Window
32,768
Max Output Tokens
8,192
Using Qwen3-30B-A3B with Python API
Using Qwen3-30B-A3B with OpenAI compatible API
import openai
client = openai.Client(
api_key= '{your_api_key}',
base_url="https://api.model.box/v1",
)
response = client.chat.completions.create(
model="qwen/qwen3-30b-a3b",
messages: [
{
role: 'user',
content:
'introduce your self',
},
]
)
print(response)