Vertex AI SDK
Pass-through endpoints for Vertex AI - call provider-specific endpoint, in native format (no translation).
| Feature | Supported | Notes | 
|---|---|---|
| Cost Tracking | โ | supports all models on /generateContentendpoint | 
| Logging | โ | works across all integrations | 
| End-user Tracking | โ | Tell us if you need this | 
| Streaming | โ | 
Supported Endpointsโ
LiteLLM supports 3 vertex ai passthrough routes:
- /vertex_aiโ routes to- https://{vertex_location}-aiplatform.googleapis.com/
- /vertex_ai/discoveryโ routes to- https://discoveryengine.googleapis.com
- /vertex_ai/liveโ upgrades to the Vertex AI Live API WebSocket (- google.cloud.aiplatform.v1.LlmBidiService/BidiGenerateContent)
How to useโ
Just replace https://REGION-aiplatform.googleapis.com with LITELLM_PROXY_BASE_URL/vertex_ai
LiteLLM supports 3 flows for calling Vertex AI endpoints via pass-through:
- 
Specific Credentials: Admin sets passthrough credentials for a specific project/region. 
- 
Default Credentials: Admin sets default credentials. 
- 
Client-Side Credentials: User can send client-side credentials through to Vertex AI (default behavior - if no default or mapped credentials are found, the request is passed through directly). 
Example Usageโ
- Specific Project/Region
- Default Credentials
- Client Credentials
model_list:
  - model_name: gemini-1.0-pro
    litellm_params:
      model: vertex_ai/gemini-1.0-pro
      vertex_project: adroit-crow-413218
      vertex_region: us-central1
      vertex_credentials: /path/to/credentials.json
      use_in_pass_through: true # ๐ KEY CHANGE
- Set in config.yaml
- Set in environment variables
default_vertex_config: 
  vertex_project: adroit-crow-413218
  vertex_region: us-central1
  vertex_credentials: /path/to/credentials.json
export DEFAULT_VERTEXAI_PROJECT="adroit-crow-413218"
export DEFAULT_VERTEXAI_LOCATION="us-central1"
export DEFAULT_GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials.json"
Try Gemini 2.0 Flash (curl)
MODEL_ID="gemini-2.0-flash-001"
PROJECT_ID="YOUR_PROJECT_ID"
curl \
  -X POST \
  -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
  -H "Content-Type: application/json" \
  "${LITELLM_PROXY_BASE_URL}/vertex_ai/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/${MODEL_ID}:streamGenerateContent" -d \
  $'{
    "contents": {
      "role": "user",
      "parts": [
        {
        "fileData": {
          "mimeType": "image/png",
          "fileUri": "gs://generativeai-downloads/images/scones.jpg"
          }
        },
        {
          "text": "Describe this picture."
        }
      ]
    }
  }'
Example Usageโ
- curl
- Vertex Node.js SDK
curl http://localhost:4000/vertex_ai/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/${MODEL_ID}:generateContent \
  -H "Content-Type: application/json" \
  -H "x-litellm-api-key: Bearer sk-1234" \
  -d '{
    "contents":[{
      "role": "user", 
      "parts":[{"text": "How are you doing today?"}]
    }]
  }'
const { VertexAI } = require('@google-cloud/vertexai');
const vertexAI = new VertexAI({
    project: 'your-project-id', // enter your vertex project id
    location: 'us-central1', // enter your vertex region
    apiEndpoint: "localhost:4000/vertex_ai" // <proxy-server-url>/vertex_ai # note, do not include 'https://' in the url
});
const model = vertexAI.getGenerativeModel({
    model: 'gemini-1.0-pro'
}, {
    customHeaders: {
        "x-litellm-api-key": "sk-1234" // Your litellm Virtual Key
    }
});
async function generateContent() {
    try {
        const prompt = {
            contents: [{
                role: 'user',
                parts: [{ text: 'How are you doing today?' }]
            }]
        };
        const response = await model.generateContent(prompt);
        console.log('Response:', response);
    } catch (error) {
        console.error('Error:', error);
    }
}
generateContent();
Vertex AI Live API WebSocketโ
LiteLLM can now proxy the Vertex AI Live API to help you experiment with streaming audio/text from Gemini Live models without exposing Google credentials to clients.
- Configure default Vertex credentials via default_vertex_configor environment variables (see examples above).
- Connect to wss://<PROXY_URL>/vertex_ai/live. LiteLLM will exchange your saved credentials for a short-lived access token and forward messages bidirectionally.
- Optional query params vertex_project,vertex_location, andmodellet you override defaults for multi-project setups or global-only models.
import asyncio
import json
from websockets.asyncio.client import connect
async def main() -> None:
    headers = {
        "x-litellm-api-key": "Bearer sk-your-litellm-key",
        "Content-Type": "application/json",
    }
    async with connect(
        "ws://localhost:4000/vertex_ai/live",
        additional_headers=headers,
    ) as ws:
        await ws.send(
            json.dumps(
                {
                    "setup": {
                        "model": "projects/your-project/locations/us-central1/publishers/google/models/gemini-2.0-flash-live-preview-04-09",
                        "generation_config": {"response_modalities": ["TEXT"]},
                    }
                }
            )
        )
        async for message in ws:
            print("server:", message)
if __name__ == "__main__":
    asyncio.run(main())
Quick Startโ
Let's call the Vertex AI /generateContent endpoint
- Add Vertex AI Credentials to your environment
export DEFAULT_VERTEXAI_PROJECT="" # "adroit-crow-413218"
export DEFAULT_VERTEXAI_LOCATION="" # "us-central1"
export DEFAULT_GOOGLE_APPLICATION_CREDENTIALS="" # "/Users/Downloads/adroit-crow-413218-a956eef1a2a8.json"
- Start LiteLLM Proxy
litellm
# RUNNING on http://0.0.0.0:4000
- Test it!
Let's call the Google AI Studio token counting endpoint
curl http://localhost:4000/vertex-ai/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/gemini-1.0-pro:generateContent \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "contents":[{
      "role": "user",
      "parts":[{"text": "How are you doing today?"}]
    }]
  }'
Supported API Endpointsโ
- Gemini API
- Embeddings API
- Imagen API
- Code Completion API
- Batch prediction API
- Tuning API
- CountTokens API
Authentication to Vertex AIโ
LiteLLM Proxy Server supports two methods of authentication to Vertex AI:
- 
Pass Vertex Credentials client side to proxy server 
- 
Set Vertex AI credentials on proxy server 
Usage Examplesโ
Gemini API (Generate Content)โ
curl http://localhost:4000/vertex_ai/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/gemini-1.5-flash-001:generateContent \
  -H "Content-Type: application/json" \
  -H "x-litellm-api-key: Bearer sk-1234" \
  -d '{"contents":[{"role": "user", "parts":[{"text": "hi"}]}]}'
Embeddings APIโ
curl http://localhost:4000/vertex_ai/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/textembedding-gecko@001:predict \
  -H "Content-Type: application/json" \
  -H "x-litellm-api-key: Bearer sk-1234" \
  -d '{"instances":[{"content": "gm"}]}'
Imagen APIโ
curl http://localhost:4000/vertex_ai/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/imagen-3.0-generate-001:predict \
  -H "Content-Type: application/json" \
  -H "x-litellm-api-key: Bearer sk-1234" \
  -d '{"instances":[{"prompt": "make an otter"}], "parameters": {"sampleCount": 1}}'
Count Tokens APIโ
curl http://localhost:4000/vertex_ai/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/gemini-1.5-flash-001:countTokens \
  -H "Content-Type: application/json" \
  -H "x-litellm-api-key: Bearer sk-1234" \
  -d '{"contents":[{"role": "user", "parts":[{"text": "hi"}]}]}'
Tuning APIโ
Create Fine Tuning Job
curl http://localhost:4000/vertex_ai/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/gemini-1.5-flash-001:tuningJobs \
      -H "Content-Type: application/json" \
      -H "x-litellm-api-key: Bearer sk-1234" \
      -d '{
  "baseModel": "gemini-1.0-pro-002",
  "supervisedTuningSpec" : {
      "training_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl"
  }
}'
Advancedโ
Pre-requisites
Use this, to avoid giving developers the raw Anthropic API key, but still letting them use Anthropic endpoints.
Use with Virtual Keysโ
- Setup environment
export DATABASE_URL=""
export LITELLM_MASTER_KEY=""
# vertex ai credentials
export DEFAULT_VERTEXAI_PROJECT="" # "adroit-crow-413218"
export DEFAULT_VERTEXAI_LOCATION="" # "us-central1"
export DEFAULT_GOOGLE_APPLICATION_CREDENTIALS="" # "/Users/Downloads/adroit-crow-413218-a956eef1a2a8.json"
litellm
# RUNNING on http://0.0.0.0:4000
- Generate virtual key
curl -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'x-litellm-api-key: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{}'
Expected Response
{
    ...
    "key": "sk-1234ewknldferwedojwojw"
}
- Test it!
curl http://localhost:4000/vertex_ai/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/gemini-1.0-pro:generateContent \
  -H "Content-Type: application/json" \
  -H "x-litellm-api-key: Bearer sk-1234" \
  -d '{
    "contents":[{
      "role": "user", 
      "parts":[{"text": "How are you doing today?"}]
    }]
  }'
Send tags in request headersโ
Use this if you wants tags to be tracked in the LiteLLM DB and on logging callbacks
Pass tags in request headers as a comma separated list. In the example below the following tags will be tracked
tags: ["vertex-js-sdk", "pass-through-endpoint"]
- curl
- Vertex Node.js SDK
curl http://localhost:4000/vertex_ai/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/gemini-1.0-pro:generateContent \
  -H "Content-Type: application/json" \
  -H "x-litellm-api-key: Bearer sk-1234" \
  -H "tags: vertex-js-sdk,pass-through-endpoint" \
  -d '{
    "contents":[{
      "role": "user", 
      "parts":[{"text": "How are you doing today?"}]
    }]
  }'
const { VertexAI } = require('@google-cloud/vertexai');
const vertexAI = new VertexAI({
    project: 'your-project-id', // enter your vertex project id
    location: 'us-central1', // enter your vertex region
    apiEndpoint: "localhost:4000/vertex_ai" // <proxy-server-url>/vertex_ai # note, do not include 'https://' in the url
});
const model = vertexAI.getGenerativeModel({
    model: 'gemini-1.0-pro'
}, {
    customHeaders: {
        "x-litellm-api-key": "sk-1234", // Your litellm Virtual Key
        "tags": "vertex-js-sdk,pass-through-endpoint"
    }
});
async function generateContent() {
    try {
        const prompt = {
            contents: [{
                role: 'user',
                parts: [{ text: 'How are you doing today?' }]
            }]
        };
        const response = await model.generateContent(prompt);
        console.log('Response:', response);
    } catch (error) {
        console.error('Error:', error);
    }
}
generateContent();