Qwen
Qwen3-14B
qwen/qwen3-14b
Qwen3-14B is the smallest 8-bit-ready dense model in the series that still supports the full reasoning toggle. At 14.8 billion parameters (40 layers, 40/8 GQA heads), it natively serves 32 k-token prompts and can be pushed to 131 k with YaRN. Benchmarks reported in the model card show it surpasses Qwen 2.5-13B and earlier QwQ models on math, code, and commonsense tests, making it a strong fit for edge inference or cost-sensitive back-end chat.
Tools
Function Calling
Context Window
32,768
Max Output Tokens
8,192