Route LangGraph or LangChain traffic to Azure AI Inference through aiproxy. There are two ways to connect:
- Reverse proxy – point the Azure inference endpoint at
aiproxy(base URL override) and skip the proxy env vars. - Forward proxy – keep the Azure endpoint unchanged, export
HTTP_PROXY/HTTPS_PROXY, and letaiproxyMITM TLS.
The sample below matches the azure-test project in this repo and keeps requests grouped by X-Proxy-Chat-Session.
aiproxyinstalled and running locally (examples usehttp://localhost:8080)- Azure AI Inference (Azure OpenAI) resource: endpoint, API key, deployment name, and API version
- Python 3.10+ with
piporuv
Create a folder with these files (works out-of-the-box for reverse proxy).
requirements.txt
langchain-core
langchain-openai
python-dotenv
Add langgraph if you want to orchestrate the client inside a state graph. Add certifi only if you plan to use forward-proxy mode (needed to bundle the proxy CA).
main.py — minimal reverse-proxy-ready script.
Imports and env loading
import os, uuid
from dotenv import load_dotenv
from langchain_core.messages import HumanMessage
from langchain_openai import AzureChatOpenAI
load_dotenv() # pull .env so the client (and optional proxy helper) see env varsAzure client builder — loads env vars and injects X-Proxy-Chat-Session.
def build_client(session_id: str) -> AzureChatOpenAI:
try:
endpoint = os.environ["AZURE_INFERENCE_ENDPOINT"]
credential = os.environ["AZURE_INFERENCE_CREDENTIAL"]
except KeyError as exc: # pragma: no cover - CLI guard
missing = exc.args[0]
raise SystemExit(f"Missing {missing}. Set AZURE_INFERENCE_ENDPOINT and AZURE_INFERENCE_CREDENTIAL.")
model = os.environ.get("AZURE_INFERENCE_MODEL", "gpt-4o-mini")
api_version = os.environ.get("AZURE_OPENAI_API_VERSION", "2024-12-01-preview")
return AzureChatOpenAI(
azure_endpoint=endpoint.rstrip("/"),
api_version=api_version,
api_key=credential,
azure_deployment=model,
model=model,
default_headers={"X-Proxy-Chat-Session": session_id},
)Why the session header matters: aiproxy groups conversation telemetry by X-Proxy-Chat-Session. Give each conversation a unique UUID. When you chain prompts/responses in one conversation, reuse the same session id so they stay grouped together in the platform.
Send a message — minimal call path; dotenv already loaded .env.
def send_one(message: str = "hey, what's up?", session_id: str | None = None, stream: bool = False):
session_id = session_id or uuid.uuid4().hex
client = build_client(session_id)
messages = [HumanMessage(content=message)]
if stream:
for chunk in client.stream(messages):
text = getattr(chunk, "content", "") or chunk.text
if text:
print(text, end="", flush=True)
print()
else:
resp = client.invoke(messages)
print(resp.content)
if __name__ == "__main__":
send_one()Both paths rely on the same code and env vars; only the proxy wiring changes. Use a .env file to flip between them.
Point the Azure inference endpoint at aiproxy and skip the proxy env vars. This avoids proxy CAs entirely.
Start aiproxy (default config). The base URL is your aiproxy address followed by /azure/<resource-name>:
http://localhost:8080/azure/<resource-name>.
aiproxy.env (reverse proxy mode):
AZURE_INFERENCE_ENDPOINT="http://localhost:8080/azure/<resource-name>"
AZURE_INFERENCE_CREDENTIAL="<api-key>"
AZURE_INFERENCE_MODEL="gpt-5.1"
AZURE_OPENAI_API_VERSION="2024-12-01-preview"
Run without proxy env vars or CA flags:
uv run python main.pyKeep the real Azure endpoint, export proxy variables, and let aiproxy MITM the TLS tunnel. This is where certifi and the set_proxy_ca helper come in.
aiproxy (default config):
aiproxy.env (forward proxy mode):
AZURE_INFERENCE_ENDPOINT="https://<resource>.cognitiveservices.azure.com"
AZURE_INFERENCE_CREDENTIAL="<api-key>"
AZURE_INFERENCE_MODEL="gpt-5.1" # your deployment name
AZURE_OPENAI_API_VERSION="2024-12-01-preview" # optional override
HTTP_PROXY="http://127.0.0.1:8080"
HTTPS_PROXY="http://127.0.0.1:8080"
PROXY_CA="$HOME/.aiproxy/aiproxy-ca-cert.pem" # trust the MITM CA
Add the following helper (and import tempfile, Path, and certifi) to trust the proxy CA without replacing system roots. It reads PROXY_CA and PROXY_USE_SYSTEM that were loaded by load_dotenv().
import tempfile
from pathlib import Path
import certifi
def set_proxy_ca() -> None:
ca_vars = ("SSL_CERT_FILE", "REQUESTS_CA_BUNDLE", "CERTIFI_BUNDLE", "OPENAI_CA_BUNDLE")
use_system = os.getenv("PROXY_USE_SYSTEM", "1").lower() not in ("0", "false")
if use_system:
for var in ca_vars:
os.environ.pop(var, None)
return
ca_path = os.getenv("PROXY_CA")
if not ca_path:
return
proxy_ca = Path(os.path.expanduser(os.path.expandvars(ca_path)))
if not proxy_ca.exists():
return
base = Path(certifi.where()).read_bytes()
bundle = tempfile.NamedTemporaryFile(prefix="proxy-ca-", suffix=".pem", delete=False)
bundle.write(base)
if not base.endswith(b"\n"):
bundle.write(b"\n")
bundle.write(proxy_ca.read_bytes())
bundle.flush(); bundle.close()
for var in ca_vars:
os.environ[var] = bundle.nameCall set_proxy_ca() once before building the client when you are in forward-proxy mode.
Example forward-proxy entrypoint:
if __name__ == "__main__":
set_proxy_ca()
send_one()Run the sample (after calling set_proxy_ca() once, before build_client):
uv run python main.pyEvery request carries X-Proxy-Chat-Session (UUID) so you can filter the proxy UI/logs by that id.
- Missing env vars: ensure
AZURE_INFERENCE_ENDPOINTandAZURE_INFERENCE_CREDENTIALare set before running. - TLS errors in forward-proxy mode: double-check
PROXY_CApoints to theaiproxyroot (~/.aiproxy/aiproxy-ca-cert.pem) and pass it to--proxy-ca. - Switching modes: clear
HTTP_PROXY/HTTPS_PROXYwhen using reverse proxy so traffic doesn’t double-proxy.