Prompt Cache
When developing AI applications, you may find yourself reusing the same prompts across multiple requests. The Prompt Cache feature can help improve response times and reduce API call costs by caching your frequently used prompts.
Restrictions and Limitations
Cache Limits
- Models have a minimum cache-eligible prompt length. Attempts to cache prompts shorter than this will result in an API error.
- Cached prompts have a lifetime (TTL) of approximately 5 minutes. This duration cannot be modified due to provider limitations.
Supported Models
Prompt Cache is currently in beta and available only for select Claude models:
Model | Minimum Cache Length | Base Input Tokens | Cache Writes | Cache Hits | Output Tokens |
---|---|---|---|---|---|
GPT-4o | / | $2.50 / MTok | $0/ MTok | $1.25 / MTok | $10.00 / MTok |
GPT-4o mini | / | $0.15 / MTok | $0/ MTok | $0.075 / MTok | $0.60 / MTok |
O1 | / | $15.00 / MTok | $0/ MTok | $7.5 / MTok | $60.00 / MTok |
DeepSeek Coder | / | $0.14 / MTok | $0 / MTok | $0.02 / MTok | $0.28 / MTok |
DeepSeek Chat | / | $0.14 / MTok | $0 / MTok | $0.02 / MTok | $0.28 / MTok |
Claude 3.5 Sonnet | 1024 | $3 / MTok | $3.75 / MTok | $0.30 / MTok | $15 / MTok |
Claude 3.5 Haiku | 2048 | $1 / MTok | $1.25 / MTok | $0.1 / MTok | $5 / MTok |
Claude 3.0 Haiku | 2048 | $0.25 / MTok | $0.30 / MTok | $0.03 / MTok | $1.25 / MTok |
Claude 3.0 Opus | 1024 | $15 / MTok | $18.75 / MTok | $1.50 / MTok | $75 / MTok |
Implementation
For models like GPT-4o, O1, and DeepSeek, the cache is automatically enabled. You don't need to add any additional parameters to your request payload.
To cache your prompts, add a cache_control
object to your request payload. Currently, the only supported cache type is ephemeral
. Once a prompt is cached, subsequent identical requests will utilize the cached prompt, reducing response time and API call costs.
You can include the cache_control
object in subsequent requests without refreshing the cache; it will directly use the cached prompt.
Example:
{
"model": "claude-3-5-sonnet",
"messages": [
{
"role": "user",
"content": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.",
"cache_control": { "type": "ephemeral" }
}
]
}
Monitoring
You can track cache performance in the Analytics dashboard. The dashboard displays metrics for Cache Creation Input Tokens
and Cache Read Input Tokens
, allowing you to assess the effectiveness of your prompt caching strategy.