Vertex AI SDK setup

Use aiproxy as a transparent forward proxy for the official Vertex AI Python SDKs. The proxy intercepts every HTTPS call the SDK makes—Google OAuth, Vertex operations, and Anthropic-on-Vertex traffic—so you gain observability without rewriting clients.

Prerequisites

aiproxy running locally (examples assume http://localhost:8080)
~/.aiproxy/aiproxy-ca-cert.pem imported into your OS trust store (see Quick Start → TLS certificate)
Google Cloud credentials seeded with gcloud auth application-default login
Python 3.9+ in a virtual environment where you can install the SDK packages referenced below

Conversation grouping header

aiproxy groups traffic into conversations using the X-Proxy-Chat-Session header. Follow these rules so transcripts appear in the UI:

Generate a unique identifier (for example uuid.uuid4().hex) when a new conversation starts.
Send the header with that identifier on every request that belongs to the conversation.
Reuse the identifier for follow-up turns; rotate it only when you intentionally start a different conversation.

Every HTTP client and gRPC stack offers a way to set custom headers (session-level headers in requests, httpx event hooks, gRPC interceptors, etc.). Use whichever mechanism your SDK exposes to insert X-Proxy-Chat-Session automatically so you do not have to set it manually on each API call.

Choose an integration mode

aiproxy can observe Vertex AI SDKs in two ways:

Forward proxy mode – your code keeps the official Vertex endpoints, while HTTP(S)_PROXY forces every request through aiproxy. This works even when gRPC is required or the SDK refuses custom base URLs.
Router endpoint – set the SDK base URL to http://localhost:8080/vertex. aiproxy receives the request directly, forwards it upstream, and emits the telemetry. This avoids proxy variables but currently speaks HTTP/REST (best for the global location or non-streaming tests).

Pick whichever path fits your environment. You can swap between them without changing business logic because both approaches simply alter transport settings.

Option A – Forward proxy mode

Step 1: trust the CA inside Python and gRPC

Vertex SDKs ultimately rely on requests, httpx, and gRPC. Point each runtime at the aiproxy certificate so the TLS interception succeeds:


export REQUESTS_CA_BUNDLE=~/.aiproxy/aiproxy-ca-cert.pem
export SSL_CERT_FILE=$REQUESTS_CA_BUNDLE
export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=$REQUESTS_CA_BUNDLE
export GRPC_PYTHON_SSL_ROOTS_CERTIFICATE_PATH=$REQUESTS_CA_BUNDLE

If your app needs to keep the stock certifi bundle, combine it with the aiproxy CA before exporting the variables. The helper below shows one approach:


import os
import certifi
from pathlib import Path

def ensure_proxy_ca():
    proxy_ca = Path("~/.aiproxy/aiproxy-ca-cert.pem").expanduser()

    if not proxy_ca.exists():
        return

    combined = Path(certifi.where()).read_bytes() + b"\n" + proxy_ca.read_bytes()
    temp = Path("/tmp/aiproxy-certifi.pem")
    temp.write_bytes(combined)

    for key in (
        "REQUESTS_CA_BUNDLE",
        "SSL_CERT_FILE",
        "GRPC_DEFAULT_SSL_ROOTS_FILE_PATH",
        "GRPC_PYTHON_SSL_ROOTS_CERTIFICATE_PATH",
    ):
        os.environ.setdefault(key, str(temp))

Step 2: route SDK traffic through aiproxy

Set standard proxy variables before starting your script. Include metadata.google.internal in NO_PROXY so ADC can still reach the metadata server when you run on GCE:


export HTTP_PROXY=http://localhost:8080
export HTTPS_PROXY=$HTTP_PROXY
export NO_PROXY=localhost,127.0.0.1,metadata.google.internal

aiproxy now sees every HTTPS request regardless of whether the SDK uses REST (global location) or gRPC (regional locations).

Step 3: configure environment defaults

Populate a .env file so your scripts inherit sensible defaults:


VERTEX_PROJECT=your-gcp-project
VERTEX_LOCATION=us-central1   # use a regional endpoint to enable gRPC + streaming
VERTEX_MODEL=gemini-1.5-pro-002

Load those variables with python-dotenv or your preferred secrets manager.

When you can change the SDK's base URL, skip proxy plumbing entirely and point the client at http://localhost:8080/vertex. aiproxy receives the request over plain HTTP, forwards it to Google with TLS, and still produces full observability. No CA installation is required because traffic between your SDK and aiproxy stays local.


vertexai.init(
    project=os.environ["VERTEX_PROJECT"],
    location=os.environ.get("VERTEX_LOCATION", "global"),
    api_endpoint="http://localhost:8080/vertex",
)

Under the hood /vertex rewrites the host to the correct aiplatform.googleapis.com endpoint and adds the X-Proxy-Chat-Session headers used throughout this repo.

For the Anthropic SDK, pass the same URL via base_url:


client = AnthropicVertex(
    project_id=os.environ["VERTEX_PROJECT"],
    region=os.environ.get("VERTEX_LOCATION", "global"),
    base_url="http://localhost:8080/vertex",
)

Use router mode when:

You only need REST (for example the global location or synchronous calls)
You cannot or do not want to modify proxy variables on the host
You prefer to keep TLS termination inside aiproxy instead of intercepting it locally

Stick with forward proxy mode if you rely on gRPC streaming, since /vertex currently focuses on HTTP transport.

Google Vertex native (Gemini)

Install the standard SDK dependencies:


uv pip install google-cloud-aiplatform python-dotenv httpx

Once the proxy and certificates are in place, the standard SDK works unchanged:


import os
import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(
    project=os.environ["VERTEX_PROJECT"],
    location=os.environ.get("VERTEX_LOCATION", "us-central1"),
    api_endpoint=os.environ.get("VERTEX_BASE_URL"),  # router mode: http://localhost:8080/vertex
)

model = GenerativeModel(os.environ.get("VERTEX_MODEL", "gemini-1.5-pro-002"))
chat = model.start_chat()
response = chat.send_message("Summarize how aiproxy monitors SDK traffic.")
print(response.text)

With the proxy variables set, you should see requests to oauth2.googleapis.com and aiplatform.googleapis.com in the aiproxy UI. When you switch the location away from global, the SDK upgrades to gRPC—aiproxy logs every RPC because the interceptor operates at the transport layer.

Optional: debug HTTP traffic

Need to see what the SDK emits before aiproxy intercepts it? Most HTTP clients let you register hooks or middleware that log the method, URL, and headers for each request. Enable those hooks temporarily when diagnosing issues, but keep X-Proxy-Chat-Session injection enabled so conversations continue to thread correctly.

Anthropic on Vertex

Install the Anthropic extras when you need Claude models:


uv pip install anthropic[vertex] python-dotenv httpx

Vertex hosts Anthropic models under the same project/location. Pass any claude-* model name to the AnthropicVertex SDK and reuse the proxy/TLS setup:


from anthropic import AnthropicVertex

client = AnthropicVertex(
    project_id=os.environ["VERTEX_PROJECT"],
    region=os.environ.get("VERTEX_LOCATION", "us-east4"),
    base_url=os.environ.get("VERTEX_BASE_URL"),  # router mode
)

message = client.messages.create(
    model="claude-3-5-sonnet@20240620",
    max_tokens=1024,
    messages=[{"role": "user", "content": "List the files in this directory."}],
)
print(message.content[0].text)

Because the Anthropic SDK uses httpx under the hood, it honors the same HTTP_PROXY/HTTPS_PROXY variables and CA overrides. Use whichever request hooks the SDK exposes to make sure X-Proxy-Chat-Session rides along with every call.

Verify traffic

Start aiproxy and note the session ID in its terminal UI.
Run the sample scripts above.
Watch the aiproxy logs for requests to oauth2.googleapis.com, aiplatform.googleapis.com, or us-anthropic.googleapis.com.

If nothing appears:

Re-run printenv | grep -E 'PROXY|CA' to ensure the proxy and CA variables are set in the current shell.
Confirm the ~/.aiproxy/aiproxy-ca-cert.pem certificate is trusted by your OS (especially on macOS where Python may bundle its own cert store).
Check that no VPN/ corporate proxy overrides HTTP{,S}_PROXY.
Inspect one request inside aiproxy and verify X-Proxy-Chat-Session is present—without it, conversation timelines remain empty.

Once you see traffic flowing, you can reuse the same pattern for any longer-lived Vertex automation or SDK-driven tooling.

export REQUESTS_CA_BUNDLE=~/.aiproxy/aiproxy-ca-cert.pem export SSL_CERT_FILE=$REQUESTS_CA_BUNDLE export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=$REQUESTS_CA_BUNDLE export GRPC_PYTHON_SSL_ROOTS_CERTIFICATE_PATH=$REQUESTS_CA_BUNDLE

import os import certifi from pathlib import Path def ensure_proxy_ca(): proxy_ca = Path("~/.aiproxy/aiproxy-ca-cert.pem").expanduser() if not proxy_ca.exists(): return combined = Path(certifi.where()).read_bytes() + b"\n" + proxy_ca.read_bytes() temp = Path("/tmp/aiproxy-certifi.pem") temp.write_bytes(combined) for key in ( "REQUESTS_CA_BUNDLE", "SSL_CERT_FILE", "GRPC_DEFAULT_SSL_ROOTS_FILE_PATH", "GRPC_PYTHON_SSL_ROOTS_CERTIFICATE_PATH", ): os.environ.setdefault(key, str(temp))

import os import vertexai from vertexai.generative_models import GenerativeModel vertexai.init( project=os.environ["VERTEX_PROJECT"], location=os.environ.get("VERTEX_LOCATION", "us-central1"), api_endpoint=os.environ.get("VERTEX_BASE_URL"), # router mode: http://localhost:8080/vertex ) model = GenerativeModel(os.environ.get("VERTEX_MODEL", "gemini-1.5-pro-002")) chat = model.start_chat() response = chat.send_message("Summarize how aiproxy monitors SDK traffic.") print(response.text)

from anthropic import AnthropicVertex client = AnthropicVertex( project_id=os.environ["VERTEX_PROJECT"], region=os.environ.get("VERTEX_LOCATION", "us-east4"), base_url=os.environ.get("VERTEX_BASE_URL"), # router mode ) message = client.messages.create( model="claude-3-5-sonnet@20240620", max_tokens=1024, messages=[{"role": "user", "content": "List the files in this directory."}], ) print(message.content[0].text)