Qwen3-14B

Qwen

qwen/qwen3-14b

Qwen3-14B is the smallest 8-bit-ready dense model in the series that still supports the full reasoning toggle. At 14.8 billion parameters (40 layers, 40/8 GQA heads), it natively serves 32 k-token prompts and can be pushed to 131 k with YaRN. Benchmarks reported in the model card show it surpasses Qwen 2.5-13B and earlier QwQ models on math, code, and commonsense tests, making it a strong fit for edge inference or cost-sensitive back-end chat.

Tools

Function Calling

Context Window

32,768

Max Output Tokens

8,192

Language

Python JavaScript Curl

Using Qwen3-14B with Python API

Using Qwen3-14B with OpenAI compatible API

import openai

client = openai.Client(
  api_key= '{your_api_key}',
  base_url="https://api.model.box/v1",
)
response = client.chat.completions.create(
model="qwen/qwen3-14b",
messages: [
  {
    role: 'user',
    content:
      'introduce your self',
    },
  ]
)
print(response)