Calling OpenAI from Python, Patterns and Pitfalls
TL;DR —
pip install openai. Use environment variable for API key. Wrap calls withtenacityretry. For JSON outputs, prompt explicitly + validate with Pydantic. Async viahttpx. Token counts viatiktoken. Production: rate limit, timeout, cost tracking.
After why integrate LLMs, the actual Python code. As of January 2023 the official openai library is at 0.26.x. ChatGPT API doesn’t exist yet; you use the completion endpoint with text-davinci-003.
Setup
pip install openai==0.26.5 tiktoken pydantic tenacity
Configure:
import os
import openai
openai.api_key = os.environ["OPENAI_API_KEY"]
Never hardcode the key. Use environment variables; load via python-dotenv for local dev.
The basic call
def complete(prompt: str, max_tokens: int = 300, temperature: float = 0.0) -> str:
response = openai.Completion.create(
model="text-davinci-003",
prompt=prompt,
max_tokens=max_tokens,
temperature=temperature,
timeout=30,
)
return response.choices[0].text.strip()
temperature=0.0 for deterministic outputs (classification, extraction). Bump to 0.7-1.0 for creative generation.
max_tokens caps the response length AND your cost. Always set explicitly.
JSON output pattern
The biggest improvement over GPT-3 era: getting structured output.
import json
from pydantic import BaseModel
class TicketClass(BaseModel):
category: str
priority: str
confidence: float
def classify_ticket(text: str) -> TicketClass:
prompt = f"""
Classify this support ticket as JSON.
Schema:
{{
"category": "billing|technical|account|other",
"priority": "low|medium|high",
"confidence": 0.0 to 1.0
}}
Ticket: {text}
JSON:
""".strip()
raw = complete(prompt, max_tokens=100)
data = json.loads(raw)
return TicketClass(**data)
Three things doing work:
- Explicit schema in the prompt
- “JSON:” trailing marker pulls the model into JSON mode
- Pydantic validates the structure on the way back
If the JSON is malformed, Pydantic raises; you retry or fall back to a default.
Retry with tenacity
OpenAI’s API has transient failures (5xx, rate limits, timeouts). Always retry.
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
import openai.error
@retry(
retry=retry_if_exception_type((openai.error.APIError, openai.error.RateLimitError, openai.error.ServiceUnavailableError, openai.error.Timeout)),
wait=wait_exponential(multiplier=1, min=2, max=30),
stop=stop_after_attempt(5),
)
def complete_with_retry(prompt: str, **kwargs) -> str:
return complete(prompt, **kwargs)
Retries on transient errors with exponential backoff. Doesn’t retry on InvalidRequestError (your prompt is wrong; retrying won’t help).
Async pattern
For high-throughput backends:
import asyncio
async def acomplete(prompt: str, max_tokens: int = 300) -> str:
response = await openai.Completion.acreate(
model="text-davinci-003",
prompt=prompt,
max_tokens=max_tokens,
temperature=0.0,
)
return response.choices[0].text.strip()
async def batch_classify(tickets: list[str]) -> list[TicketClass]:
tasks = [classify_async(t) for t in tickets]
return await asyncio.gather(*tasks)
asyncio.gather runs concurrent requests. Watch your rate limit (60 req/min for new accounts).
For batches of 100+, throttle:
import asyncio
from asyncio import Semaphore
async def batch_with_limit(items, fn, concurrency=10):
sem = Semaphore(concurrency)
async def wrapped(x):
async with sem:
return await fn(x)
return await asyncio.gather(*[wrapped(i) for i in items])
10 concurrent requests with 1 sec each = 10 req/sec = 600/min. Stays under typical limits.
Token counting with tiktoken
To estimate cost or check prompt size:
import tiktoken
enc = tiktoken.encoding_for_model("text-davinci-003")
def token_count(text: str) -> int:
return len(enc.encode(text))
def estimate_cost(prompt: str, max_response_tokens: int) -> float:
total = token_count(prompt) + max_response_tokens
return total * 0.02 / 1000 # text-davinci-003 pricing
Run before sending. Reject prompts that exceed your budget per request.
Prompt template pattern
Keep prompts in a separate module:
# prompts.py
CLASSIFY_TICKET = """
Classify this support ticket as JSON.
Schema:
{schema}
Ticket: {ticket}
JSON:
""".strip()
SUMMARIZE_THREAD = """
Summarize this email thread in 3 bullet points. Focus on decisions and next steps.
Thread:
{thread}
Summary:
""".strip()
Usage:
from prompts import CLASSIFY_TICKET
prompt = CLASSIFY_TICKET.format(schema=SCHEMA, ticket=text)
Version-control prompts. Treat them like code. A/B test them. Diff them across changes.
What I monitor in production
import time
import logging
def complete_observed(prompt: str, **kwargs):
start = time.time()
try:
result = complete_with_retry(prompt, **kwargs)
cost = estimate_cost(prompt, kwargs.get("max_tokens", 300))
logging.info("openai_call", extra={
"duration_s": time.time() - start,
"tokens_in": token_count(prompt),
"tokens_out": token_count(result),
"cost_usd": cost,
"status": "success",
})
return result
except Exception as e:
logging.error("openai_call_failed", extra={
"duration_s": time.time() - start,
"error": str(e),
})
raise
Push these metrics to Prometheus or similar. Track cost daily; spikes catch bugs (runaway loops).
Common Pitfalls
Hardcoded API key. Pushed to git once = compromised forever. Use env vars.
No retry. Production breaks during OpenAI’s regular outages.
No timeout. A 60-second hung request blocks your handler.
JSON parsed without validation. Model returns malformed JSON sometimes. Pydantic catches.
Token-unaware long prompts. Hit context limit (4097 tokens for text-davinci-003); request fails.
Async without semaphore. Burst → rate-limited → cascading failures.
Logging full prompts to console. May contain PII / secrets. Log structured metadata; redact bodies.
Wrapping Up
Python + openai SDK + tenacity + Pydantic + tiktoken = the working stack for OpenAI integration in early 2023. Tuesday: Node.js variant.