Qwen2.5-Max

Qwen2.5-Max is a large-scale Mixture-of-Experts (MoE) model developed with over 20 trillion tokens in pretraining and further refined through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). Recognizing the benefits of scaling both data and model size, researchers have faced challenges in effectively managing extremely large models. With the recent release of DeepSeek V3 providing insights into this process, Qwen2.5-Max emerges as a strong competitor in the field.

The model has been rigorously evaluated against leading proprietary and open-weight models using benchmarks such as MMLU-Pro (college-level knowledge), LiveCodeBench (coding capabilities), LiveBench (general intelligence), and Arena-Hard (human preference approximation). Results indicate that Qwen2.5-Max outperforms DeepSeek V3 in key benchmarks, including Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond, while maintaining competitive performance in MMLU-Pro.

For base model comparisons, proprietary models like GPT-4o and Claude-3.5-Sonnet were not accessible. However, Qwen2.5-Max was benchmarked against top open-weight models such as DeepSeek V3 (MoE), Llama-3.1-405B (dense), and Qwen2.5-72B (dense). The results demonstrate notable advantages across most benchmarks, reinforcing the model’s strengths.

Qwen2.5-Max is now available via API on Alibaba Cloud, and users can explore its capabilities through Qwen Chat. With ongoing advancements in post-training techniques, future iterations of the model are expected to achieve even greater performance.

Using Qwen2.5-Max with Python API