Google
Google

Gemini Pro Vision 1.0

google/gemini-pro-vision

Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response.

See the benchmarks and prompting guidelines from Deepmind.

Usage of Gemini is subject to Google's Gemini Terms of Use.

#multimodal

Capability

Vision Support

Context Window

45,875

Max Output Tokens

2,048

Using Gemini Pro Vision 1.0 with Python API

Using Gemini Pro Vision 1.0 with OpenAI compatible API

import openai

client = openai.Client(
  api_key= '{your_api_key}',
  base_url="https://api.model.box/v1",
)
response = client.chat.completions.create(
model="google/gemini-pro-vision",
messages: [
  {
    role: 'user',
    content:
      'introduce your self',
    },
  ]
)
print(response)