Google
Gemini Pro Vision 1.0
google/gemini-pro-vision
Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response.
See the benchmarks and prompting guidelines from Deepmind.
Usage of Gemini is subject to Google's Gemini Terms of Use.
#multimodal
Capability
Vision Support
Context Window
45,875
Max Output Tokens
2,048
Using Gemini Pro Vision 1.0 with Python API
Using Gemini Pro Vision 1.0 with OpenAI compatible API
import openai
client = openai.Client(
api_key= '{your_api_key}',
base_url="https://api.model.box/v1",
)
response = client.chat.completions.create(
model="google/gemini-pro-vision",
messages: [
{
role: 'user',
content:
'introduce your self',
},
]
)
print(response)