Models | ModelBox

Qwen

Qwen3-235B-A22B

Qwen3-235B-A22B is the flagship Mixture-of-Experts (MoE) model in the new Qwen 3 family. With 235 billion total parameters—but only 22 billion active per token—it delivers near-frontier accuracy while remaining deployable on multi-GPU clusters. The model uses 128 experts (8 are routed per token) across 94 transformer layers and employs Gated Q-Attention (64 Q-heads, 4 KV-heads) for efficient scaling. It natively handles 32 768-token contexts and has been validated up to 131 072 tokens using YaRN positional scaling. Like all Qwen 3 models, it ships under Apache-2.0, supports explicit thinking / no-thinking modes, and shows state-of-the-art reasoning, code generation, and multilingual performance among open-source LLMs.

Qwen

Qwen3-30B-A3B

Qwen3-30B-A3B is the “Pro” MoE tier designed for balanced cost-to-quality. It contains 30.5 billion total parameters, but activates only ~3.3 billion at inference—yielding GPT-3.5-class quality at a fraction of the memory footprint. The architecture mirrors the 235 B model (128 experts, 8 routed) but runs 48 transformer layers with 32/4 GQA heads. It retains the 32 k native context window, YaRN compatibility, and the same controllable thinking switch that lets developers trade raw reasoning traces for latency.

Qwen

Qwen3-32B

Qwen3-32B is a dense 32.8 billion-parameter model, positioned as the high-accuracy single-expert counterpart to the MoE line. It uses 64 transformer layers with 64/8 GQA heads and the full 32 k context window (extendable via YaRN). Because every parameter is active, it excels at deterministic generation, agentic tool-calling, and creative writing where dense representations can outperform similarly sized MoE peers. It is drop-in compatible with Hugging Face Transformers ≥ 4.51, vLLM, SGLang, and common GGUF/MLX-LM ports.

Qwen

Qwen3-14B

Qwen3-14B is the smallest 8-bit-ready dense model in the series that still supports the full reasoning toggle. At 14.8 billion parameters (40 layers, 40/8 GQA heads), it natively serves 32 k-token prompts and can be pushed to 131 k with YaRN. Benchmarks reported in the model card show it surpasses Qwen 2.5-13B and earlier QwQ models on math, code, and commonsense tests, making it a strong fit for edge inference or cost-sensitive back-end chat.

Qwen

QwQ 32B

QwQ-32B is an research model developed by the Qwen Team, focused on advancing AI reasoning capabilities. As a preview release, it demonstrates promising analytical abilities while having several important limitations: 1. Language Mixing and Code-Switching: The model may mix languages or switch between them unexpectedly, affecting response clarity. 2. Recursive Reasoning Loops: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer. 3. Safety and Ethical Considerations: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it. 4. Performance and Benchmark Limitations: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.

Open Source

Qwen

Qwen2.5-Max

Qwen2.5-Max is a large-scale Mixture-of-Experts (MoE) model developed with over 20 trillion tokens in pretraining and further refined through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). Recognizing the benefits of scaling both data and model size, researchers have faced challenges in effectively managing extremely large models. With the recent release of DeepSeek V3 providing insights into this process, Qwen2.5-Max emerges as a strong competitor in the field. The model has been rigorously evaluated against leading proprietary and open-weight models using benchmarks such as MMLU-Pro (college-level knowledge), LiveCodeBench (coding capabilities), LiveBench (general intelligence), and Arena-Hard (human preference approximation). Results indicate that Qwen2.5-Max outperforms DeepSeek V3 in key benchmarks, including Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond, while maintaining competitive performance in MMLU-Pro. For base model comparisons, proprietary models like GPT-4o and Claude-3.5-Sonnet were not accessible. However, Qwen2.5-Max was benchmarked against top open-weight models such as DeepSeek V3 (MoE), Llama-3.1-405B (dense), and Qwen2.5-72B (dense). The results demonstrate notable advantages across most benchmarks, reinforcing the model’s strengths. Qwen2.5-Max is now available via API on Alibaba Cloud, and users can explore its capabilities through Qwen Chat. With ongoing advancements in post-training techniques, future iterations of the model are expected to achieve even greater performance.

Open Source

Qwen

Qwen2 VL 72B

### What's New in Qwen2-VL? #### [](https://huggingface.co/Qwen/Qwen2-VL-72B#key-enhancements)Key Enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

Vision

Open Source

Qwen

QwQ 32B Preview

QwQ-32B-Preview is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities. As a preview release, it demonstrates promising analytical abilities while having several important limitations: 1. Language Mixing and Code-Switching: The model may mix languages or switch between them unexpectedly, affecting response clarity. 2. Recursive Reasoning Loops: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer. 3. Safety and Ethical Considerations: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it. 4. Performance and Benchmark Limitations: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.

Open Source

Qwen

Qwen2.5 Coder 32B Instruct

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: * Significantly improvements in code generation, code reasoning and code fixing. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o. * A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. Long-context Support up to 128K tokens.

Open Source

Qwen

Qwen2.5 7B Instruct

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. This repo contains the instruction-tuned 7B Qwen2.5 model, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 7.61B - Number of Paramaters (Non-Embedding): 6.53B - Number of Layers: 28 - Number of Attention Heads (GQA): 28 for Q and 4 for KV - Context Length: Full 131,072 tokens and generation 8192 tokens - Please refer to [this section](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts. For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).

Open Source

Qwen

Qwen2.5 Turbo (1M Context)

Following the release of Qwen2.5, the team responded to the community's demand for handling longer contexts. Over the past few months, significant optimizations have been made to enhance the model's capabilities and inference performance for extremely long contexts. Now, the team is proud to introduce the new **Qwen2.5-Turbo** model, featuring the following advancements: - **Extended Context Support**: The context length has been increased from 128k to 1M tokens, equivalent to approximately 1 million English words or 1.5 million Chinese characters. This capacity corresponds to 10 full-length novels, 150 hours of speech transcripts, or 30,000 lines of code. Qwen2.5-Turbo achieves 100% accuracy in the 1M-token Passkey Retrieval task and scores 93.1 on the RULER long-text evaluation benchmark, outperforming GPT-4 (91.6) and GLM4-9B-1M (89.9). Moreover, the model retains strong performance in short sequence tasks, comparable to GPT-4o-mini. - **Faster Inference Speed**: Leveraging sparse attention mechanisms, the time to generate the first token for a 1M-token context has been reduced from 4.9 minutes to just 68 seconds, representing a 4.3x speed improvement. - **Cost Efficiency**: The pricing remains unchanged at $0.05 per 1M tokens. At this rate, Qwen2.5-Turbo processes 3.6 times more tokens than GPT-4o-mini for the same cost.

Qwen

Qwen2.5 72B Instruct

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: * Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. * Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. * Long-context Support up to 128K tokens and can generate up to 8K tokens. * Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

Qwen

Qwen2 VL 72B Instruct

Qwen2-VL is the latest iteration of multimodal large language models developed by the Qwen team at Alibaba Cloud. This advanced AI system represents a significant leap forward in the field of vision-language models, building upon its predecessor, Qwen-VL. Qwen2-VL boasts state-of-the-art capabilities in understanding images of various resolutions and aspect ratios, as well as the ability to comprehend videos exceeding 20 minutes in length. One of the most notable features of Qwen2-VL is its versatility as an agent capable of operating mobile devices, robots, and other systems based on visual input and text instructions. This makes it a powerful tool for a wide range of applications, from personal assistance to industrial automation. The model also offers robust multilingual support, enabling it to understand and process text in various languages within images, catering to a global user base.

Vision

Qwen

Qwen2 VL 7B Instruct

Qwen2-VL is the latest iteration of multimodal large language models developed by the Qwen team at Alibaba Cloud. This advanced AI system represents a significant leap forward in the field of vision-language models, building upon its predecessor, Qwen-VL. Qwen2-VL boasts state-of-the-art capabilities in understanding images of various resolutions and aspect ratios, as well as the ability to comprehend videos exceeding 20 minutes in length. One of the most notable features of Qwen2-VL is its versatility as an agent capable of operating mobile devices, robots, and other systems based on visual input and text instructions. This makes it a powerful tool for a wide range of applications, from personal assistance to industrial automation. The model also offers robust multilingual support, enabling it to understand and process text in various languages within images, catering to a global user base.

Vision

Qwen

Qwen2 Math 7B Instruct

Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT4o).

Open Source

Qwen

Qwen2 Math 1.5B Instruct

Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT4o).

Open Source

Qwen

Qwen2 Math 72B Instruct

Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT4o).

Open Source

Qwen

Qwen2 Audio 7B Instruct

Qwen2-Audio is the new series of Qwen large audio-language models. Qwen2-Audio is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. We introduce two distinct audio interaction modes: * voice chat: users can freely engage in voice interactions with Qwen2-Audio without text input; * audio analysis: users could provide audio and text instructions for analysis during the interaction;

Open Source

Qwen

Qwen 2 72B Chat

Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 72B Qwen2 model. Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc. Qwen2-72B-Instruct supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs.

Open Source

Qwen

Qwen 1.5 110B Chat

Qwen1.5 110B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Qwen

Qwen 1.5 14B Chat

Qwen1.5 14B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Qwen

Qwen 1.5 32B Chat

Qwen1.5 32B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Qwen

Qwen 1.5 4B Chat

Qwen1.5 4B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Qwen

Qwen 1.5 72B Chat

Qwen1.5 72B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Qwen

Qwen 1.5 7B Chat

Qwen1.5 7B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Qwen

Qwen 1.5 1.8B Chat

Qwen1.5 1.8B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Qwen

Qwen 1.5 110B

Qwen1.5 110B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Qwen

Qwen 1.5 14B

Qwen1.5 14B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Qwen

Qwen 1.5 32B

Qwen1.5 32B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Qwen

Qwen 1.5 4B

Qwen1.5 4B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Qwen

Qwen 1.5 72B

Qwen1.5 72B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Qwen

Qwen 1.5 7B

Qwen1.5 7B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Qwen

Qwen 1.5 1.8B

Qwen1.5 1.8B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Qwen

Qwen 2 7B Chat

Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 72B Qwen2 model. Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc. Qwen2-72B-Instruct supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs.

Open Source