Models

Anthropic
Anthropic

Claude 3.5 Sonnet (new)

Claude 3.5 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.

Vision

Anthropic
Anthropic

Claude 3.5 Sonnet 20241022 (new)

Claude 3.5 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.

Vision

Anthropic
Anthropic

Claude 3.5 Haiku

Claude 3.5 Haiku is the next generation of our fastest model. For the same cost and similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses even Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks. Claude 3.5 Haiku is particularly strong on coding tasks. For example, it scores 40.6% on SWE-bench Verified, outperforming many agents using publicly available state-of-the-art models—including the original Claude 3.5 Sonnet and GPT-4o.

Vision

Meta
Meta

Llama 3.2 90B Instruct

Llama 3.2 is the latest iteration of Meta's open-source AI model family, offering enhanced capabilities and versatility. The new release includes models of various sizes: 1B, 3B, 11B, and 90B parameters. The 1B and 3B models are lightweight, multilingual, and text-only, designed for efficient deployment on mobile and edge devices. The larger 11B and 90B models are multimodal, capable of processing both text and high-resolution images. Key features of Llama 3.2 include: 1. Improved performance across over 150 benchmark datasets in multiple languages. 2. Multimodal capabilities in larger models for image understanding and visual reasoning. 3. Integration with Llama Stack, providing a streamlined developer experience with support for multiple programming languages and deployment options. 4. Enhanced support for agentic components, including tool calling, safety guardrails, and retrieval augmented generation. 5. Compatibility with various hardware platforms, including ARM, MediaTek, and Qualcomm for mobile and edge devices. Llama 3.2 has garnered significant attention, with over 350 million downloads on Hugging Face alone. It's being utilized across various industries for applications such as data privacy, productivity enhancement, contextual understanding, and solving complex business needs. The ecosystem around Llama continues to grow, with partners like Dell, Zoom, DoorDash, and KPMG leveraging the technology for diverse use cases.

Vision

Open Source

Meta
Meta

Llama 3.2 11B Instruct

Llama 3.2 is the latest iteration of Meta's open-source AI model family, offering enhanced capabilities and versatility. The new release includes models of various sizes: 1B, 3B, 11B, and 90B parameters. The 1B and 3B models are lightweight, multilingual, and text-only, designed for efficient deployment on mobile and edge devices. The larger 11B and 90B models are multimodal, capable of processing both text and high-resolution images. Key features of Llama 3.2 include: 1. Improved performance across over 150 benchmark datasets in multiple languages. 2. Multimodal capabilities in larger models for image understanding and visual reasoning. 3. Integration with Llama Stack, providing a streamlined developer experience with support for multiple programming languages and deployment options. 4. Enhanced support for agentic components, including tool calling, safety guardrails, and retrieval augmented generation. 5. Compatibility with various hardware platforms, including ARM, MediaTek, and Qualcomm for mobile and edge devices. Llama 3.2 has garnered significant attention, with over 350 million downloads on Hugging Face alone. It's being utilized across various industries for applications such as data privacy, productivity enhancement, contextual understanding, and solving complex business needs. The ecosystem around Llama continues to grow, with partners like Dell, Zoom, DoorDash, and KPMG leveraging the technology for diverse use cases.

Vision

Open Source

Meta
Meta

Llama 3.2 3B Instruct

Llama 3.2 is the latest iteration of Meta's open-source AI model family, offering enhanced capabilities and versatility. The new release includes models of various sizes: 1B, 3B, 11B, and 90B parameters. The 1B and 3B models are lightweight, multilingual, and text-only, designed for efficient deployment on mobile and edge devices. The larger 11B and 90B models are multimodal, capable of processing both text and high-resolution images. Key features of Llama 3.2 include: 1. Improved performance across over 150 benchmark datasets in multiple languages. 2. Multimodal capabilities in larger models for image understanding and visual reasoning. 3. Integration with Llama Stack, providing a streamlined developer experience with support for multiple programming languages and deployment options. 4. Enhanced support for agentic components, including tool calling, safety guardrails, and retrieval augmented generation. 5. Compatibility with various hardware platforms, including ARM, MediaTek, and Qualcomm for mobile and edge devices. Llama 3.2 has garnered significant attention, with over 350 million downloads on Hugging Face alone. It's being utilized across various industries for applications such as data privacy, productivity enhancement, contextual understanding, and solving complex business needs. The ecosystem around Llama continues to grow, with partners like Dell, Zoom, DoorDash, and KPMG leveraging the technology for diverse use cases.

Open Source

OpenAI
OpenAI

GPT-4o

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

Vision

OpenAI
OpenAI

GPT 4o Mini

GPT 4o Mini ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

Vision

OpenAI
OpenAI

O1 Preview

The OpenAI o1 Preview models are designed to spend more time thinking before responding, improving their ability to reason through complex tasks in science, coding, and math. The first model of this series is now available in ChatGPT and the API, with regular updates expected.

OpenAI
OpenAI

O1 Mini

The OpenAI o1-mini is a newly released smaller version of the o1 model, designed to optimize reasoning tasks, particularly in coding. It provides advanced reasoning capabilities similar to its larger counterpart, making it well-suited for generating and debugging complex code. However, it is 80% cheaper and faster, making it a cost-effective solution for developers who need reasoning power but don’t require broad world knowledge.

Tongyi
Qwen

Qwen2.5 72B Instruct

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: * Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. * Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. * Long-context Support up to 128K tokens and can generate up to 8K tokens. * Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

Tongyi
Qwen

Qwen2 VL 72B Instruct

Qwen2-VL is the latest iteration of multimodal large language models developed by the Qwen team at Alibaba Cloud. This advanced AI system represents a significant leap forward in the field of vision-language models, building upon its predecessor, Qwen-VL. Qwen2-VL boasts state-of-the-art capabilities in understanding images of various resolutions and aspect ratios, as well as the ability to comprehend videos exceeding 20 minutes in length. One of the most notable features of Qwen2-VL is its versatility as an agent capable of operating mobile devices, robots, and other systems based on visual input and text instructions. This makes it a powerful tool for a wide range of applications, from personal assistance to industrial automation. The model also offers robust multilingual support, enabling it to understand and process text in various languages within images, catering to a global user base.

Vision

OpenAI
OpenAI

ChatGPT-4o Latest

ChatGPT-4o contains latest improvements for chat use cases, expected for testing/evaluation purpose. ChatGPT-4o also supports structured outputs, with up to 16k max output tokens GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

Vision

Tongyi
Qwen

Qwen2 VL 7B Instruct

Qwen2-VL is the latest iteration of multimodal large language models developed by the Qwen team at Alibaba Cloud. This advanced AI system represents a significant leap forward in the field of vision-language models, building upon its predecessor, Qwen-VL. Qwen2-VL boasts state-of-the-art capabilities in understanding images of various resolutions and aspect ratios, as well as the ability to comprehend videos exceeding 20 minutes in length. One of the most notable features of Qwen2-VL is its versatility as an agent capable of operating mobile devices, robots, and other systems based on visual input and text instructions. This makes it a powerful tool for a wide range of applications, from personal assistance to industrial automation. The model also offers robust multilingual support, enabling it to understand and process text in various languages within images, catering to a global user base.

Vision

Mistral
Mistral

Pixtral 12B(2409)

Pixtral 12B is a state-of-the-art multimodal AI model developed by Mistral AI. It combines strong visual understanding capabilities with excellent text processing, making it a versatile tool for various multimodal tasks. Key features include: * Natively multimodal architecture, trained on interleaved image and text data * 400M parameter vision encoder and 12B parameter multimodal decoder based on Mistral Nemo Support for variable image sizes and multiple images within a 128k token context window * Top-tier performance on multimodal benchmarks like MMMU (52.5%), outperforming many larger models * Maintained excellence in text-only tasks, unlike some other multimodal models Pixtral excels in tasks such as chart understanding, document question-answering, and multimodal reasoning. It's particularly strong in instruction following for both multimodal and text-only scenarios. The model can process images at their native resolution and aspect ratio, offering flexibility in token usage for image processing.

Vision

DeepSeek
Deepseek

DeepSeek-V2.5 Chat

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit [DeepSeek-V2 page](https://github.com/deepseek-ai/DeepSeek-V2) for more information. DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:

Open Source

DeepSeek
Deepseek

Deepseek V2.5 Coder

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit [DeepSeek-V2 page](https://github.com/deepseek-ai/DeepSeek-V2) for more information. DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:

Open Source

Google
Google

Gemini Flash 1.5 0827 (experiment)

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.

Vision

Google
Google

Gemini Pro 1.5 0827 (experiment)

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.*

mattshumer

Reflection Llama-3.1 70B

Reflection Llama-3.1 70B is (currently) the world's top open-source LLM, trained with a new technique called Reflection-Tuning that teaches a LLM to detect mistakes in its reasoning and correct course.

Open Source

Meta
Meta

Llama 3.1 405B Instruct

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Open Source

Meta
Meta

Llama 3.1 70B Instruct

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Open Source

Meta
Meta

Llama 3.1 8B Instruct

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Open Source

OpenAI
OpenAI

GPT-4o 2024-08-06

GPT-4o with structured outputs, with up to 16k max output tokens GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

Vision

Tongyi
Qwen

Qwen2 Math 7B Instruct

Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT4o).

Open Source

Tongyi
Qwen

Qwen2 Math 1.5B Instruct

Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT4o).

Open Source

Tongyi
Qwen

Qwen2 Math 72B Instruct

Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT4o).

Open Source

Tongyi
Qwen

Qwen2 Audio 7B Instruct

Qwen2-Audio is the new series of Qwen large audio-language models. Qwen2-Audio is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. We introduce two distinct audio interaction modes: * voice chat: users can freely engage in voice interactions with Qwen2-Audio without text input; * audio analysis: users could provide audio and text instructions for analysis during the interaction;

Open Source

Google
Google

Gemma2 2B Instruct

Gemma2 is a versatile tool used in both machine learning and genetic research. It is part of the PaliGemma family, which includes powerful Vision-Language Models (VLMs) built on open components like the SigLIP vision model and the Gemma language model. In genetics, Gemma2 implements the Genome-wide Efficient Mixed-Model Association (GEMMA) for genome-wide association studies (GWAS) . It is also recognized in the open-source community for its efficiency in handling large models and datasets. Additionally, it provides an implementation of the GEMMA algorithm for statistical analysis of multivariate linear mixed models [4].

Open Source

OpenAI
OpenAI

GPT-4 Turbo

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to Dec 2023. This model is updated by OpenAI to point to the latest version of [GPT-4 Turbo](/models?q=openai/gpt-4-turbo), currently gpt-4-turbo-2024-04-09 (as of April 2024).

Vision

OpenAI
OpenAI

GPT-4 Vision

Ability to understand images, in addition to all other [GPT-4 Turbo capabilties](/models/openai/gpt-4-turbo). Training data: up to Apr 2023. **Note:** heavily rate limited by OpenAI while in preview. #multimodal

Vision

Anthropic
Anthropic

Claude 3 Haiku

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal

Vision

Anthropic
Anthropic

Claude 3 Opus

Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal

Vision

Anthropic
Anthropic

Claude 3 Sonnet

Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal

Vision

Google
Google

Gemini Flash 1.5 (preview)

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter. #multimodal

Vision

Google
Google

Gemini Pro 1.5 (preview)

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.* #multimodal

Meta
Meta

Llama 3 70B Instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Open Source

Meta
Meta

Llama 3 70B

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This is the base 70B pre-trained version. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Open Source

Meta
Meta

Llama 3 8B

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This is the base 8B pre-trained version. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Open Source

Tongyi
Qwen

Qwen 2 72B Chat

Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 72B Qwen2 model. Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc. Qwen2-72B-Instruct supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs.

Open Source

Anthropic
Anthropic

Claude 3.5 Sonnet(20240620)

Claude 3.5 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.

Vision

OpenAI
OpenAI

GPT-4o-2024-05-13

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

Vision

OpenAI
OpenAI

GPT-3.5 Turbo

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Updated by OpenAI to point to the [latest version of GPT-3.5](/models?q=openai/gpt-3.5). Training data up to Sep 2021.

OpenAI
OpenAI

GPT-4o 64k(alpha test version)

An experimental version of GPT-4o with a maximum of 64K output tokens per request. GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

Vision

OpenAI
OpenAI

GPT-3.5 Turbo 16k

The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Sep 2021. This version has a higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls.

OpenAI
OpenAI

GPT-3.5 Turbo (older v0301)

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Updated by OpenAI to point to the [latest version of GPT-3.5](/models?q=openai/gpt-3.5). Training data up to Sep 2021.

OpenAI
OpenAI

GPT-3.5 Turbo (older v0613)

GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Updated by OpenAI to point to the [latest version of GPT-3.5](/models?q=openai/gpt-3.5). Training data up to Sep 2021.

Google
Google

Gemma 2B

Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks like question answering, summarization, and reasoning. The Gemma 7B variant is comparable in performance to leading open source models. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).

Open Source

OpenAI
OpenAI

GPT-3.5 Turbo 16k

The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Sep 2021.

OpenAI
OpenAI

GPT-3.5 Turbo 16k

This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up to Sep 2021.

OpenAI
OpenAI

GPT-3.5 Turbo Instruct

Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions.

OpenAI
OpenAI

GPT-4 Turbo

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to Dec 2023. This model is updated by OpenAI to point to the latest version of [GPT-4 Turbo](/models?q=openai/gpt-4-turbo), currently gpt-4-turbo-2024-04-09 (as of April 2024).

OpenAI
OpenAI

GPT-4 Turbo

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to Dec 2023. This model is updated by OpenAI to point to the latest version of [GPT-4 Turbo](/models?q=openai/gpt-4-turbo), currently gpt-4-turbo-2024-04-09 (as of April 2024).

OpenAI
OpenAI

GPT-4 Turbo Vision Preview(older v1106)

GPT-4 model with the ability to understand images, in addition to all other GPT-4 Turbo capabilities. This is a preview model, we recommend developers to now use gpt-4-turbo which includes vision capabilities.

Vision

OpenAI
OpenAI

GPT-4 Turbo

The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to Dec 2023. This model is updated by OpenAI to point to the latest version of [GPT-4 Turbo](/models?q=openai/gpt-4-turbo), currently gpt-4-turbo-2024-04-09 (as of April 2024).

Vision

OpenAI
OpenAI

GPT-4 0613

OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning capabilities. Training data: up to Sep 2021.

Anthropic
Anthropic

Claude 3 Haiku (20240307)

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal

Vision

Anthropic
Anthropic

Claude 3 Opus(20240229)

Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal

Vision

Anthropic
Anthropic

Claude 3 Sonnet(20240229)

Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal

Vision

OpenAI
OpenAI

GPT-3.5 Turbo 16k

This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up to Sep 2021.

Meta
Meta

Llama 3 8B Instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Open Source

Mistral
Mistral

Mistral: Mixtral 8x22B (base)

Mixtral 8x22B is a large-scale language model from Mistral AI. It consists of 8 experts, each 22 billion parameters, with each token using 2 experts at a time. It was released via [X](https://twitter.com/MistralAI/status/1777869263778291896). #moe

Open Source

Mistral
Mistral

Mistral: Mixtral 8x22B Instruct

Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding, and reasoning - large context length (64k) - fluency in English, French, Italian, German, and Spanish See benchmarks on the launch announcement [here](https://mistral.ai/news/mixtral-8x22b/). #moe

Open Source

Mistral
Mistral

Mixtral 8x7B (base)

A pretrained generative Sparse Mixture of Experts, by Mistral AI. Incorporates 8 experts (feed-forward networks) for a total of 47B parameters. Base model (not fine-tuned for instructions) - see [Mixtral 8x7B Instruct](/models/mistralai/mixtral-8x7b-instruct) for an instruct-tuned model. #moe

Open Source

Mistral
Mistral

Mixtral 8x7B Instruct

A pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters. Instruct model fine-tuned by Mistral. #moe

Open Source

OpenAI
OpenAI

GPT-4

OpenAI's flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning capabilities. Training data: up to Sep 2021.

OpenAI
OpenAI

GPT-4 (older v0314)

GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 14. Training data: up to Sep 2021.

OpenAI
OpenAI

GPT-4 Turbo (older v1106)

The latest GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Apr 2023. **Note:** heavily rate limited by OpenAI while in preview.

OpenAI
OpenAI

GPT-4 32k

GPT-4-32k is an extended version of GPT-4, with the same capabilities but quadrupled context length, allowing for processing up to 40 pages of text in a single pass. This is particularly beneficial for handling longer content like interacting with PDFs without an external vector database. Training data: up to Sep 2021.

OpenAI
OpenAI

GPT-4 32k (older v0314)

GPT-4-32k is an extended version of GPT-4, with the same capabilities but quadrupled context length, allowing for processing up to 40 pages of text in a single pass. This is particularly beneficial for handling longer content like interacting with PDFs without an external vector database. Training data: up to Sep 2021.

Google
Google

Gemini Pro 1.0

Google's flagship text generation model. Designed to handle natural language tasks, multiturn text and code chat, and code generation. See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.google/technologies/gemini/). Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).

Google
Google

Gemini Pro Vision 1.0

Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response. See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.google/technologies/gemini/). Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). #multimodal

Vision

Google
Google

Gemma2 27B Instruct

Gemma2 is a versatile tool used in both machine learning and genetic research. It is part of the PaliGemma family, which includes powerful Vision-Language Models (VLMs) built on open components like the SigLIP vision model and the Gemma language model. In genetics, Gemma2 implements the Genome-wide Efficient Mixed-Model Association (GEMMA) for genome-wide association studies (GWAS) . It is also recognized in the open-source community for its efficiency in handling large models and datasets. Additionally, it provides an implementation of the GEMMA algorithm for statistical analysis of multivariate linear mixed models [4].

Open Source

Google
Google

Gemma2 9B Instruct

Gemma2 is a versatile tool used in both machine learning and genetic research. It is part of the PaliGemma family, which includes powerful Vision-Language Models (VLMs) built on open components like the SigLIP vision model and the Gemma language model. In genetics, Gemma2 implements the Genome-wide Efficient Mixed-Model Association (GEMMA) for genome-wide association studies (GWAS) . It is also recognized in the open-source community for its efficiency in handling large models and datasets. Additionally, it provides an implementation of the GEMMA algorithm for statistical analysis of multivariate linear mixed models [4].

Open Source

Meta
Meta

CodeLlama 34B Instruct

Code Llama is built upon Llama 2 and excels at filling in code, handling extensive input contexts, and folling programming instructions without prior training for various programming tasks.

Open Source

Meta
Meta

Llama v2 13B Chat

A 13 billion parameter language model from Meta, fine tuned for chat completions

Open Source

Meta
Meta

Llama v2 70B Chat

The flagship, 70 billion parameter language model from Meta, fine tuned for chat completions. Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.

Open Source

Meta
Meta

LlamaGuard 2 8B

This safeguard model has 8B parameters and is based on the Llama 3 family. Just like is predecessor, [LlamaGuard 1](https://huggingface.co/meta-llama/LlamaGuard-7b), it can do both prompt and response classification. LlamaGuard 2 acts as a normal LLM would, generating text that indicates whether the given input/output is safe/unsafe. If deemed unsafe, it will also share the content categories violated. For best results, please use raw prompt input or the `/completions` endpoint, instead of the chat API. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Open Source

Mistral
Mistral

Mistral Large

This is Mistral AI's closed-source, flagship model. It's powered by a closed-source prototype and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large/). It is fluent in English, French, Spanish, German, and Italian, with high grammatical accuracy, and its 32K tokens context window allows precise information recall from large documents.

Open Source

Mistral
Mistral

Mistral Medium

This is Mistral AI's closed-source, medium-sided model. It's powered by a closed-source prototype and excels at reasoning, code, JSON, chat, and more. In benchmarks, it compares with many of the flagship models of other companies.

Open Source

Mistral
Mistral

Mistral Small

This model is currently powered by Mixtral-8X7B-v0.1, a sparse mixture of experts model with 12B active parameters. It has better reasoning, exhibits more capabilities, can produce and reason about code, and is multiligual, supporting English, French, German, Italian, and Spanish. #moe

Open Source

Mistral
Mistral

Mistral Tiny

This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than [Mistral 7B](/models/mistralai/mistral-7b-instruct), inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.

Open Source

Azure
Microsoft

WizardLM-2 7B

WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger opensource leading models It is a finetune of [Mistral 7B Instruct](/models/mistralai/mistral-7b-instruct), using the same technique as [WizardLM-2 8x22B](/models/microsoft/wizardlm-2-8x22b). To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/). #moe

Open Source

Azure
Microsoft

WizardLM-2 8x22B

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is an instruct finetune of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/). #moe

Open Source

Meta
Meta

Meta: CodeLlama 70B Instruct

Code Llama is a family of large language models for code. This one is based on [Llama 2 70B](/models/meta-llama/llama-2-70b-chat) and provides zero-shot instruction-following ability for programming tasks.

Cohere
Cohere

Cohere: Command

Command is an instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models. Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).

Open Source

Cohere
Cohere

Cohere: Command R

Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents. Read the launch post [here](https://txt.cohere.com/command-r/). Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).

Open Source

Cohere
Cohere

Cohere: Command R+

Command R+ is a new, 104B-parameter LLM from Cohere. It's useful for roleplay, general consumer usecases, and Retrieval Augmented Generation (RAG). It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/). Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).

01.AI
01AI

Yi 34B (base)

The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/).

Open Source

01.AI
01AI

Yi 34B Chat

The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/). This version is instruct-tuned to work better for chat.

Open Source

01.AI
01AI

Yi 6B (base)

The Yi series models are large language models trained from scratch by developers at [01.AI](https://01.ai/).

Open Source

Tongyi
Qwen

Qwen 1.5 110B Chat

Qwen1.5 110B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Tongyi
Qwen

Qwen 1.5 14B Chat

Qwen1.5 14B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Tongyi
Qwen

Qwen 1.5 32B Chat

Qwen1.5 32B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Tongyi
Qwen

Qwen 1.5 4B Chat

Qwen1.5 4B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Tongyi
Qwen

Qwen 1.5 72B Chat

Qwen1.5 72B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Tongyi
Qwen

Qwen 1.5 7B Chat

Qwen1.5 7B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

DBRX
Databricks

Databricks: DBRX 132B Instruct

DBRX is a new open source large language model developed by Databricks. At 132B, it outperforms existing open source LLMs like Llama 2 70B and [Mixtral-8x7b](/models/mistralai/mixtral-8x7b) on standard industry benchmarks for language understanding, programming, math, and logic. It uses a fine-grained mixture-of-experts (MoE) architecture. 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. See the launch announcement and benchmark results [here](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm). #moe

Open Source

Fireworks

FireLLaVA 13B

A blazing fast vision-language model, FireLLaVA quickly understands both text and images. It achieves impressive chat skills in tests, and was designed to mimic multimodal GPT-4. The first commercially permissive open source LLaVA model, trained entirely on open source LLM generated instruction following data.

Openchat

OpenChat 3.5

OpenChat is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been trained on mixed-quality data without preference labels.

Open Source

Perplexity
Perplexity

Perplexity: Llama3 Sonar 70B

Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is a normal offline LLM, but the [online version](/models/perplexity/llama-3-sonar-large-32k-online) of this model has Internet access.

Perplexity
Perplexity

Perplexity: Llama3 Sonar 70B Online

Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is the online version of the [offline chat model](/models/perplexity/llama-3-sonar-large-32k-chat). It is focused on delivering helpful, up-to-date, and factual responses. #online

Perplexity
Perplexity

Perplexity: Llama3 Sonar 8B

Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is a normal offline LLM, but the [online version](/models/perplexity/llama-3-sonar-small-32k-online) of this model has Internet access.

Perplexity
Perplexity

Perplexity: Llama3 Sonar 8B Online

Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is the online version of the [offline chat model](/models/perplexity/llama-3-sonar-small-32k-chat). It is focused on delivering helpful, up-to-date, and factual responses. #online

Phind

Phind: CodeLlama 34B v2

A fine-tune of CodeLlama-34B on an internal dataset that helps it exceed GPT-4 on some benchmarks, including HumanEval.

Open Source

Snowflake

Snowflake: Arctic Instruct

Arctic is a dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI Research Team. Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating. To read more about this model's release, [click here](https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/).

Azure
Microsoft

Phi-3 Medium Instruct

Phi-3 Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.

Open Source

Mistral
Mistral

Mistral-7B-Instruct-v0.3

The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 Supports v3 Tokenizer Supports function calling

Open Source

Tongyi
Qwen

Qwen 1.5 1.8B Chat

Qwen1.5 1.8B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Tongyi
Qwen

Qwen 1.5 110B

Qwen1.5 110B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Tongyi
Qwen

Qwen 1.5 14B

Qwen1.5 14B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Tongyi
Qwen

Qwen 1.5 32B

Qwen1.5 32B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Tongyi
Qwen

Qwen 1.5 4B

Qwen1.5 4B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Tongyi
Qwen

Qwen 1.5 72B

Qwen1.5 72B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Tongyi
Qwen

Qwen 1.5 7B

Qwen1.5 7B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Open Source

Tongyi
Qwen

Qwen 1.5 1.8B

Qwen1.5 1.8B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Mistral
Mistral

Mistral-7B-Instruct-v0.1

The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 Supports v3 Tokenizer Supports function calling

Open Source

Mistral
Mistral

Mistral-7B-Instruct-v0.2

The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 Supports v3 Tokenizer Supports function calling

Open Source

Mistral
Mistral

Mistral-7B-Instruct-v0.3

The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 Supports v3 Tokenizer Supports function calling

Open Source

Mistral
Mistral

Mistral-7B-v0.1

The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3. Mistral-7B-v0.3 has the following changes compared to Mistral-7B-v0.2 Extended vocabulary to 32768 Supports v3 Tokenizer Supports function calling

Open Source

Mistral
Mistral

Mistral: Mixtral 8x22B Instruct v0.1

Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding, and reasoning - large context length (64k) - fluency in English, French, Italian, German, and Spanish See benchmarks on the launch announcement [here](https://mistral.ai/news/mixtral-8x22b/). #moe

Open Source

Mistral
Mistral

Mixtral 8x7B Instruct v0.1

A pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters. Instruct model fine-tuned by Mistral. #moe

Open Source

Meta
Meta

Llama v2 7B Chat

The flagship, 70 billion parameter language model from Meta, fine tuned for chat completions. Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.

Open Source

cognitivecomputations

Dolphin

This model is based on Mixtral-8x7b The base model has 32k context, I finetuned it with 16k. This Dolphin is really good at coding, I trained with a lot of coding data. It is very obedient but it is not DPO tuned - so you still might need to encourage it in the system prompt as I show in the below examples.

Open Source

Google
Google

Gemma 7B Instruct

Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks like question answering, summarization, and reasoning. The Gemma 7B variant is comparable in performance to leading open source models. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).

Open Source

Google
Google

Gemma2 27B

Gemma2 is a versatile tool used in both machine learning and genetic research. It is part of the PaliGemma family, which includes powerful Vision-Language Models (VLMs) built on open components like the SigLIP vision model and the Gemma language model. In genetics, Gemma2 implements the Genome-wide Efficient Mixed-Model Association (GEMMA) for genome-wide association studies (GWAS) . It is also recognized in the open-source community for its efficiency in handling large models and datasets. Additionally, it provides an implementation of the GEMMA algorithm for statistical analysis of multivariate linear mixed models [4].

Open Source

Mistral
Mistral

Codestral Mamba

Codestral Mamba is a newly released language model specialized in code generation, developed by Mistral AI. It boasts linear time inference, allowing it to efficiently handle sequences of infinite length, making it ideal for code productivity tasks. The model was trained with advanced code and reasoning capabilities, performing comparably to state-of-the-art transformer models. It supports extensive in-context retrieval up to 256k tokens. Codestral Mamba is freely available under the Apache 2.0 license and can be deployed via the mistral-inference SDK or TensorRT-LLM. For more details, visit the [original article](https://mistral.ai/news/codestral-mamba/).

Open Source

Tongyi
Qwen

Qwen 2 7B Chat

Qwen2 is the new series of Qwen large language models. For Qwen2, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model. This repo contains the instruction-tuned 72B Qwen2 model. Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc. Qwen2-72B-Instruct supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs.

Open Source

Google
Google

Gemma 9B

Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks like question answering, summarization, and reasoning. The Gemma 7B variant is comparable in performance to leading open source models. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).

Open Source

Mistral
Mistral

Mistral Nemo

Mistral AI and NVIDIA have collaborated to develop Mistral NeMo, a new 12B language model that represents a significant advancement in AI technology. This model boasts a large context window of up to 128k tokens and delivers state-of-the-art performance in reasoning, world knowledge, and coding accuracy for its size category. Mistral NeMo utilizes a standard architecture, making it easily adaptable and a straightforward replacement for systems currently using Mistral 7B. In a move to promote widespread adoption, both pre-trained base and instruction-tuned checkpoints have been released under the Apache 2.0 license.

Open Source

Mistral
Mistral

Mistral Large 2

Mistral AI's latest offering, Mistral Large 2, represents a significant advancement in language model technology. With 123 billion parameters and a 128k context window, it supports numerous languages and coding languages. The model sets a new benchmark in performance-to-cost ratio, achieving 84.0% accuracy on MMLU. It excels in code generation, reasoning, and multilingual tasks, competing with top-tier models like GPT-4 and Claude 3 Opus. Key improvements include enhanced instruction-following, reduced hallucination, and better handling of multi-turn conversations. The model's multilingual proficiency and advanced function calling capabilities make it particularly suitable for diverse business applications. Mistral Large 2 is designed for single-node inference and long-context applications, balancing performance with practical usability.

Open Source

Google
Google

ShieldGemma 2B

ShieldGemma is a series of safety content moderation models built upon Gemma 2 that target four harm categories (sexually explicit, dangerous content, hate, and harassment). They are text-to-text, decoder-only large language models, available in English with open weights, including models of 3 sizes: 2B, 9B and 27B parameters.

Open Source

Google
Google

ShieldGemma 9B

ShieldGemma is a series of safety content moderation models built upon Gemma 2 that target four harm categories (sexually explicit, dangerous content, hate, and harassment). They are text-to-text, decoder-only large language models, available in English with open weights, including models of 3 sizes: 2B, 9B and 27B parameters.

Open Source

Google
Google

ShieldGemma 27B

ShieldGemma is a series of safety content moderation models built upon Gemma 2 that target four harm categories (sexually explicit, dangerous content, hate, and harassment). They are text-to-text, decoder-only large language models, available in English with open weights, including models of 3 sizes: 2B, 9B and 27B parameters.

Open Source

Black Forest Labs

FLUX.1 Dev

FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. Key Features 1. Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro]. 2. Competitive prompt following, matching the performance of closed source alternatives . Trained using guidance distillation, making FLUX.1 [dev] more efficient. 3. Open weights to drive new scientific research, and empower artists to develop innovative workflows. 4. Generated outputs can be used for personal, scientific, and commercial purposes as described in the [flux-1-dev-non-commercial-license](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md).

Black Forest Labs

FLUX.1 Schnell

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. Key Features: 1. Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives. 2. Trained using latent adversarial diffusion distillation, FLUX.1 [schnell] can generate high-quality images in only 1 to 4 steps. 4. Released under the apache-2.0 licence, the model can be used for personal, scientific, and commercial purposes.

Black Forest Labs

FLUX.1 Pro

FLUX.1 [pro] is the best of FLUX.1, offering state-of-the-art performance image generation with top of the line prompt following, visual quality, image detail and output diversity. All FLUX.1 model variants support a diverse range of aspect ratios and resolutions in 0.1 and 2.0 megapixels

OpenAI
OpenAI

Dall-E 3

DALL·E 3 understands significantly more nuance and detail than our previous systems, allowing you to easily translate your ideas into 
exceptionally accurate images.

OpenAI
OpenAI

O1 Perview 2024-09-12

The OpenAI o1 Preview models are designed to spend more time thinking before responding, improving their ability to reason through complex tasks in science, coding, and math. The first model of this series is now available in ChatGPT and the API, with regular updates expected.

OpenAI
OpenAI

O1 Mini 2024-09-12

The OpenAI o1-mini is a newly released smaller version of the o1 model, designed to optimize reasoning tasks, particularly in coding. It provides advanced reasoning capabilities similar to its larger counterpart, making it well-suited for generating and debugging complex code. However, it is 80% cheaper and faster, making it a cost-effective solution for developers who need reasoning power but don’t require broad world knowledge.

Google
Google

Gemini Flash 1.5 0827 (experiment)

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.

Vision

Google
Google

Gemini Pro 1.5 0827 (experiment)

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.*

Vision

Google
Google

Gemini Flash 1.5 0827 (experiment)

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.

Vision

Google
Google

Gemini Pro 1.5 0827 (experiment)

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). *Note: Preview models are offered for testing purposes and should not be used in production apps. This model is **heavily rate limited**.*

Vision