Models | ModelBox

Deepseek

DeepSeek Reasoner(r1)

The first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, were introduced to advance reasoning capabilities. DeepSeek-R1-Zero, developed using large-scale reinforcement learning (RL) without prior supervised fine-tuning (SFT), displayed impressive reasoning performance. Through RL, it naturally acquired a range of powerful and intriguing reasoning behaviors. However, DeepSeek-R1-Zero faced challenges such as repetitive outputs, poor readability, and language mixing. To address these limitations and further improve reasoning capabilities, DeepSeek-R1 was developed, incorporating cold-start data before RL. DeepSeek-R1 demonstrated performance on par with OpenAI-o1 across tasks involving mathematics, coding, and reasoning. To foster progress within the research community, DeepSeek-R1-Zero, DeepSeek-R1, and six distilled dense models based on Llama and Qwen were open-sourced. Among them, DeepSeek-R1-Distill-Qwen-32B surpassed OpenAI-o1-mini on various benchmarks, setting new performance standards for dense models.

Open Source

Deepseek

DeepSeek Reasoner(r1)

The first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, were introduced to advance reasoning capabilities. DeepSeek-R1-Zero, developed using large-scale reinforcement learning (RL) without prior supervised fine-tuning (SFT), displayed impressive reasoning performance. Through RL, it naturally acquired a range of powerful and intriguing reasoning behaviors. However, DeepSeek-R1-Zero faced challenges such as repetitive outputs, poor readability, and language mixing. To address these limitations and further improve reasoning capabilities, DeepSeek-R1 was developed, incorporating cold-start data before RL. DeepSeek-R1 demonstrated performance on par with OpenAI-o1 across tasks involving mathematics, coding, and reasoning. To foster progress within the research community, DeepSeek-R1-Zero, DeepSeek-R1, and six distilled dense models based on Llama and Qwen were open-sourced. Among them, DeepSeek-R1-Distill-Qwen-32B surpassed OpenAI-o1-mini on various benchmarks, setting new performance standards for dense models.

Open Source

Deepseek

DeepSeek Chat(V3)

DeepSeek V3, developed by DeepSeek, is a cutting-edge large language model with 685 billion parameters, making it one of the largest in the world. Its 687.9 GB size reflects its vast knowledge base and complexity. The model uses a Mixture of Experts (MoE) architecture, featuring 256 experts, with 8 experts activated per token. This design enables efficient resource allocation, providing high scalability without sacrificing performance. In early benchmarks, DeepSeek V3 secured second place on the Aider Polyglot leaderboard with a score of 48.4%, surpassing models like Claude-3-5 and Gemini-EXP. This highlights its strength in multilingual and contextual reasoning tasks. Currently, DeepSeek V3 is accessible through chat.deepseek.com and the DeepSeek API, as part of a staged rollout. Its scale and innovation surpass even Meta AI’s Llama 3.1 (405B parameters), setting a new standard for large-scale AI models. With its robust performance and innovative architecture, DeepSeek V3 is poised to redefine expectations for efficiency and accuracy in AI-powered applications.

Open Source

Deepseek

Deepseek Coder(V3)

DeepSeek V3, developed by DeepSeek, is a cutting-edge large language model with 685 billion parameters, making it one of the largest in the world. Its 687.9 GB size reflects its vast knowledge base and complexity. The model uses a Mixture of Experts (MoE) architecture, featuring 256 experts, with 8 experts activated per token. This design enables efficient resource allocation, providing high scalability without sacrificing performance. In early benchmarks, DeepSeek V3 secured second place on the Aider Polyglot leaderboard with a score of 48.4%, surpassing models like Claude-3-5 and Gemini-EXP. This highlights its strength in multilingual and contextual reasoning tasks. Currently, DeepSeek V3 is accessible through chat.deepseek.com and the DeepSeek API, as part of a staged rollout. Its scale and innovation surpass even Meta AI’s Llama 3.1 (405B parameters), setting a new standard for large-scale AI models. With its robust performance and innovative architecture, DeepSeek V3 is poised to redefine expectations for efficiency and accuracy in AI-powered applications.

Open Source

Deepseek

DeepSeek Chat(V2.5)

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit [DeepSeek-V2 page](https://github.com/deepseek-ai/DeepSeek-V2) for more information. DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:

Open Source

Deepseek

Deepseek Coder(V2.5)

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit [DeepSeek-V2 page](https://github.com/deepseek-ai/DeepSeek-V2) for more information. DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:

Open Source