Qwen
Qwen3-14B
qwen/qwen3-14b
Qwen3-14B is the smallest 8-bit-ready dense model in the series that still supports the full reasoning toggle. At 14.8 billion parameters (40 layers, 40/8 GQA heads), it natively serves 32 k-token prompts and can be pushed to 131 k with YaRN. Benchmarks reported in the model card show it surpasses Qwen 2.5-13B and earlier QwQ models on math, code, and commonsense tests, making it a strong fit for edge inference or cost-sensitive back-end chat.
Tools
Function Calling
Context Window
32,768
Max Output Tokens
8,192
Using Qwen3-14B with Python API
Using Qwen3-14B with OpenAI compatible API
import openai
client = openai.Client(
api_key= '{your_api_key}',
base_url="https://api.model.box/v1",
)
response = client.chat.completions.create(
model="qwen/qwen3-14b",
messages: [
{
role: 'user',
content:
'introduce your self',
},
]
)
print(response)