Qwen3-30B-A3B

Qwen

qwen/qwen3-30b-a3b

Qwen3-30B-A3B is the “Pro” MoE tier designed for balanced cost-to-quality. It contains 30.5 billion total parameters, but activates only ~3.3 billion at inference—yielding GPT-3.5-class quality at a fraction of the memory footprint. The architecture mirrors the 235 B model (128 experts, 8 routed) but runs 48 transformer layers with 32/4 GQA heads. It retains the 32 k native context window, YaRN compatibility, and the same controllable thinking switch that lets developers trade raw reasoning traces for latency.

Tools

Function Calling

Context Window

32,768

Max Output Tokens

8,192

Language

Python JavaScript Curl

Using Qwen3-30B-A3B with Python API

Using Qwen3-30B-A3B with OpenAI compatible API

import openai

client = openai.Client(
  api_key= '{your_api_key}',
  base_url="https://api.model.box/v1",
)
response = client.chat.completions.create(
model="qwen/qwen3-30b-a3b",
messages: [
  {
    role: 'user',
    content:
      'introduce your self',
    },
  ]
)
print(response)