background-shape
Calling OpenAI from Python, Patterns and Pitfalls
January 6, 2023 · 4 min read · by Muhammad Amal ai

TL;DRpip install openai. Use environment variable for API key. Wrap calls with tenacity retry. For JSON outputs, prompt explicitly + validate with Pydantic. Async via httpx. Token counts via tiktoken. Production: rate limit, timeout, cost tracking.

After why integrate LLMs, the actual Python code. As of January 2023 the official openai library is at 0.26.x. ChatGPT API doesn’t exist yet; you use the completion endpoint with text-davinci-003.

Setup

pip install openai==0.26.5 tiktoken pydantic tenacity

Configure:

import os
import openai

openai.api_key = os.environ["OPENAI_API_KEY"]

Never hardcode the key. Use environment variables; load via python-dotenv for local dev.

The basic call

def complete(prompt: str, max_tokens: int = 300, temperature: float = 0.0) -> str:
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=temperature,
        timeout=30,
    )
    return response.choices[0].text.strip()

temperature=0.0 for deterministic outputs (classification, extraction). Bump to 0.7-1.0 for creative generation.

max_tokens caps the response length AND your cost. Always set explicitly.

JSON output pattern

The biggest improvement over GPT-3 era: getting structured output.

import json
from pydantic import BaseModel

class TicketClass(BaseModel):
    category: str
    priority: str
    confidence: float

def classify_ticket(text: str) -> TicketClass:
    prompt = f"""
Classify this support ticket as JSON.

Schema:
{{
  "category": "billing|technical|account|other",
  "priority": "low|medium|high",
  "confidence": 0.0 to 1.0
}}

Ticket: {text}

JSON:
""".strip()

    raw = complete(prompt, max_tokens=100)
    data = json.loads(raw)
    return TicketClass(**data)

Three things doing work:

  1. Explicit schema in the prompt
  2. “JSON:” trailing marker pulls the model into JSON mode
  3. Pydantic validates the structure on the way back

If the JSON is malformed, Pydantic raises; you retry or fall back to a default.

Retry with tenacity

OpenAI’s API has transient failures (5xx, rate limits, timeouts). Always retry.

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
import openai.error

@retry(
    retry=retry_if_exception_type((openai.error.APIError, openai.error.RateLimitError, openai.error.ServiceUnavailableError, openai.error.Timeout)),
    wait=wait_exponential(multiplier=1, min=2, max=30),
    stop=stop_after_attempt(5),
)
def complete_with_retry(prompt: str, **kwargs) -> str:
    return complete(prompt, **kwargs)

Retries on transient errors with exponential backoff. Doesn’t retry on InvalidRequestError (your prompt is wrong; retrying won’t help).

Async pattern

For high-throughput backends:

import asyncio

async def acomplete(prompt: str, max_tokens: int = 300) -> str:
    response = await openai.Completion.acreate(
        model="text-davinci-003",
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=0.0,
    )
    return response.choices[0].text.strip()

async def batch_classify(tickets: list[str]) -> list[TicketClass]:
    tasks = [classify_async(t) for t in tickets]
    return await asyncio.gather(*tasks)

asyncio.gather runs concurrent requests. Watch your rate limit (60 req/min for new accounts).

For batches of 100+, throttle:

import asyncio
from asyncio import Semaphore

async def batch_with_limit(items, fn, concurrency=10):
    sem = Semaphore(concurrency)
    async def wrapped(x):
        async with sem:
            return await fn(x)
    return await asyncio.gather(*[wrapped(i) for i in items])

10 concurrent requests with 1 sec each = 10 req/sec = 600/min. Stays under typical limits.

Token counting with tiktoken

To estimate cost or check prompt size:

import tiktoken

enc = tiktoken.encoding_for_model("text-davinci-003")

def token_count(text: str) -> int:
    return len(enc.encode(text))

def estimate_cost(prompt: str, max_response_tokens: int) -> float:
    total = token_count(prompt) + max_response_tokens
    return total * 0.02 / 1000  # text-davinci-003 pricing

Run before sending. Reject prompts that exceed your budget per request.

Prompt template pattern

Keep prompts in a separate module:

# prompts.py
CLASSIFY_TICKET = """
Classify this support ticket as JSON.

Schema:
{schema}

Ticket: {ticket}

JSON:
""".strip()

SUMMARIZE_THREAD = """
Summarize this email thread in 3 bullet points. Focus on decisions and next steps.

Thread:
{thread}

Summary:
""".strip()

Usage:

from prompts import CLASSIFY_TICKET

prompt = CLASSIFY_TICKET.format(schema=SCHEMA, ticket=text)

Version-control prompts. Treat them like code. A/B test them. Diff them across changes.

What I monitor in production

import time
import logging

def complete_observed(prompt: str, **kwargs):
    start = time.time()
    try:
        result = complete_with_retry(prompt, **kwargs)
        cost = estimate_cost(prompt, kwargs.get("max_tokens", 300))
        logging.info("openai_call", extra={
            "duration_s": time.time() - start,
            "tokens_in": token_count(prompt),
            "tokens_out": token_count(result),
            "cost_usd": cost,
            "status": "success",
        })
        return result
    except Exception as e:
        logging.error("openai_call_failed", extra={
            "duration_s": time.time() - start,
            "error": str(e),
        })
        raise

Push these metrics to Prometheus or similar. Track cost daily; spikes catch bugs (runaway loops).

Common Pitfalls

Hardcoded API key. Pushed to git once = compromised forever. Use env vars.

No retry. Production breaks during OpenAI’s regular outages.

No timeout. A 60-second hung request blocks your handler.

JSON parsed without validation. Model returns malformed JSON sometimes. Pydantic catches.

Token-unaware long prompts. Hit context limit (4097 tokens for text-davinci-003); request fails.

Async without semaphore. Burst → rate-limited → cascading failures.

Logging full prompts to console. May contain PII / secrets. Log structured metadata; redact bodies.

Wrapping Up

Python + openai SDK + tenacity + Pydantic + tiktoken = the working stack for OpenAI integration in early 2023. Tuesday: Node.js variant.