Use aiproxy as a transparent forward proxy for the official Vertex AI Python SDKs. The proxy intercepts every HTTPS call the SDK makes—Google OAuth, Vertex operations, and Anthropic-on-Vertex traffic—so you gain observability without rewriting clients.
aiproxyrunning locally (examples assumehttp://localhost:8080)~/.aiproxy/aiproxy-ca-cert.pemimported into your OS trust store (see Quick Start → TLS certificate)- Google Cloud credentials seeded with
gcloud auth application-default login - Python 3.9+ in a virtual environment where you can install the SDK packages referenced below
aiproxy groups traffic into conversations using the X-Proxy-Chat-Session header. Follow these rules so transcripts appear in the UI:
- Generate a unique identifier (for example
uuid.uuid4().hex) when a new conversation starts. - Send the header with that identifier on every request that belongs to the conversation.
- Reuse the identifier for follow-up turns; rotate it only when you intentionally start a different conversation.
Every HTTP client and gRPC stack offers a way to set custom headers (session-level headers in requests, httpx event hooks, gRPC interceptors, etc.). Use whichever mechanism your SDK exposes to insert X-Proxy-Chat-Session automatically so you do not have to set it manually on each API call.
aiproxy can observe Vertex AI SDKs in two ways:
- Forward proxy mode – your code keeps the official Vertex endpoints, while
HTTP(S)_PROXYforces every request throughaiproxy. This works even when gRPC is required or the SDK refuses custom base URLs. - Router endpoint – set the SDK base URL to
http://localhost:8080/vertex.aiproxyreceives the request directly, forwards it upstream, and emits the telemetry. This avoids proxy variables but currently speaks HTTP/REST (best for thegloballocation or non-streaming tests).
Pick whichever path fits your environment. You can swap between them without changing business logic because both approaches simply alter transport settings.
Vertex SDKs ultimately rely on requests, httpx, and gRPC. Point each runtime at the aiproxy certificate so the TLS interception succeeds:
export REQUESTS_CA_BUNDLE=~/.aiproxy/aiproxy-ca-cert.pem
export SSL_CERT_FILE=$REQUESTS_CA_BUNDLE
export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=$REQUESTS_CA_BUNDLE
export GRPC_PYTHON_SSL_ROOTS_CERTIFICATE_PATH=$REQUESTS_CA_BUNDLEIf your app needs to keep the stock certifi bundle, combine it with the aiproxy CA before exporting the variables. The helper below shows one approach:
import os
import certifi
from pathlib import Path
def ensure_proxy_ca():
proxy_ca = Path("~/.aiproxy/aiproxy-ca-cert.pem").expanduser()
if not proxy_ca.exists():
return
combined = Path(certifi.where()).read_bytes() + b"\n" + proxy_ca.read_bytes()
temp = Path("/tmp/aiproxy-certifi.pem")
temp.write_bytes(combined)
for key in (
"REQUESTS_CA_BUNDLE",
"SSL_CERT_FILE",
"GRPC_DEFAULT_SSL_ROOTS_FILE_PATH",
"GRPC_PYTHON_SSL_ROOTS_CERTIFICATE_PATH",
):
os.environ.setdefault(key, str(temp))Set standard proxy variables before starting your script. Include metadata.google.internal in NO_PROXY so ADC can still reach the metadata server when you run on GCE:
export HTTP_PROXY=http://localhost:8080
export HTTPS_PROXY=$HTTP_PROXY
export NO_PROXY=localhost,127.0.0.1,metadata.google.internalaiproxy now sees every HTTPS request regardless of whether the SDK uses REST (global location) or gRPC (regional locations).
Populate a .env file so your scripts inherit sensible defaults:
VERTEX_PROJECT=your-gcp-project
VERTEX_LOCATION=us-central1 # use a regional endpoint to enable gRPC + streaming
VERTEX_MODEL=gemini-1.5-pro-002
Load those variables with python-dotenv or your preferred secrets manager.
When you can change the SDK's base URL, skip proxy plumbing entirely and point the client at http://localhost:8080/vertex. aiproxy receives the request over plain HTTP, forwards it to Google with TLS, and still produces full observability. No CA installation is required because traffic between your SDK and aiproxy stays local.
vertexai.init(
project=os.environ["VERTEX_PROJECT"],
location=os.environ.get("VERTEX_LOCATION", "global"),
api_endpoint="http://localhost:8080/vertex",
)Under the hood /vertex rewrites the host to the correct aiplatform.googleapis.com endpoint and adds the X-Proxy-Chat-Session headers used throughout this repo.
For the Anthropic SDK, pass the same URL via base_url:
client = AnthropicVertex(
project_id=os.environ["VERTEX_PROJECT"],
region=os.environ.get("VERTEX_LOCATION", "global"),
base_url="http://localhost:8080/vertex",
)Use router mode when:
- You only need REST (for example the
globallocation or synchronous calls) - You cannot or do not want to modify proxy variables on the host
- You prefer to keep TLS termination inside
aiproxyinstead of intercepting it locally
Stick with forward proxy mode if you rely on gRPC streaming, since /vertex currently focuses on HTTP transport.
Install the standard SDK dependencies:
uv pip install google-cloud-aiplatform python-dotenv httpxOnce the proxy and certificates are in place, the standard SDK works unchanged:
import os
import vertexai
from vertexai.generative_models import GenerativeModel
vertexai.init(
project=os.environ["VERTEX_PROJECT"],
location=os.environ.get("VERTEX_LOCATION", "us-central1"),
api_endpoint=os.environ.get("VERTEX_BASE_URL"), # router mode: http://localhost:8080/vertex
)
model = GenerativeModel(os.environ.get("VERTEX_MODEL", "gemini-1.5-pro-002"))
chat = model.start_chat()
response = chat.send_message("Summarize how aiproxy monitors SDK traffic.")
print(response.text)With the proxy variables set, you should see requests to oauth2.googleapis.com and aiplatform.googleapis.com in the aiproxy UI. When you switch the location away from global, the SDK upgrades to gRPC—aiproxy logs every RPC because the interceptor operates at the transport layer.
Need to see what the SDK emits before aiproxy intercepts it? Most HTTP clients let you register hooks or middleware that log the method, URL, and headers for each request. Enable those hooks temporarily when diagnosing issues, but keep X-Proxy-Chat-Session injection enabled so conversations continue to thread correctly.
Install the Anthropic extras when you need Claude models:
uv pip install anthropic[vertex] python-dotenv httpxVertex hosts Anthropic models under the same project/location. Pass any claude-* model name to the AnthropicVertex SDK and reuse the proxy/TLS setup:
from anthropic import AnthropicVertex
client = AnthropicVertex(
project_id=os.environ["VERTEX_PROJECT"],
region=os.environ.get("VERTEX_LOCATION", "us-east4"),
base_url=os.environ.get("VERTEX_BASE_URL"), # router mode
)
message = client.messages.create(
model="claude-3-5-sonnet@20240620",
max_tokens=1024,
messages=[{"role": "user", "content": "List the files in this directory."}],
)
print(message.content[0].text)Because the Anthropic SDK uses httpx under the hood, it honors the same HTTP_PROXY/HTTPS_PROXY variables and CA overrides. Use whichever request hooks the SDK exposes to make sure X-Proxy-Chat-Session rides along with every call.
- Start
aiproxyand note the session ID in its terminal UI. - Run the sample scripts above.
- Watch the
aiproxylogs for requests tooauth2.googleapis.com,aiplatform.googleapis.com, orus-anthropic.googleapis.com.
If nothing appears:
- Re-run
printenv | grep -E 'PROXY|CA'to ensure the proxy and CA variables are set in the current shell. - Confirm the
~/.aiproxy/aiproxy-ca-cert.pemcertificate is trusted by your OS (especially on macOS where Python may bundle its own cert store). - Check that no VPN/ corporate proxy overrides
HTTP{,S}_PROXY. - Inspect one request inside
aiproxyand verifyX-Proxy-Chat-Sessionis present—without it, conversation timelines remain empty.
Once you see traffic flowing, you can reuse the same pattern for any longer-lived Vertex automation or SDK-driven tooling.