Calling OpenAI from Python, Patterns and Pitfalls

Python article cover illustration on a gradient background

January 6, 2023 · 4 min read · by Muhammad Amal ai

TL;DR — pip install openai. Use environment variable for API key. Wrap calls with tenacity retry. For JSON outputs, prompt explicitly + validate with Pydantic. Async via httpx. Token counts via tiktoken. Production: rate limit, timeout, cost tracking.

After why integrate LLMs , the actual Python code. As of January 2023 the official openai library is at 0.26.x. ChatGPT API doesn’t exist yet; you use the completion endpoint with text-davinci-003.

Setup

pip install openai==0.26.5 tiktoken pydantic tenacity

Configure:

import os
import openai

openai.api_key = os.environ["OPENAI_API_KEY"]

Never hardcode the key. Use environment variables; load via python-dotenv for local dev.

The basic call

def complete(prompt: str, max_tokens: int = 300, temperature: float = 0.0) -> str:
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=temperature,
        timeout=30,
    )
    return response.choices[0].text.strip()

temperature=0.0 for deterministic outputs (classification, extraction). Bump to 0.7-1.0 for creative generation.

max_tokens caps the response length AND your cost. Always set explicitly.

JSON output pattern

The biggest improvement over GPT-3 era: getting structured output.

import json
from pydantic import BaseModel

class TicketClass(BaseModel):
    category: str
    priority: str
    confidence: float

def classify_ticket(text: str) -> TicketClass:
    prompt = f"""
Classify this support ticket as JSON.

Schema:
{{
  "category": "billing|technical|account|other",
  "priority": "low|medium|high",
  "confidence": 0.0 to 1.0
}}

Ticket: {text}

JSON:
""".strip()

    raw = complete(prompt, max_tokens=100)
    data = json.loads(raw)
    return TicketClass(**data)

Three things doing work:

Explicit schema in the prompt
“JSON:” trailing marker pulls the model into JSON mode
Pydantic validates the structure on the way back

If the JSON is malformed, Pydantic raises; you retry or fall back to a default.

Retry with tenacity

OpenAI’s API has transient failures (5xx, rate limits, timeouts). Always retry.

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
import openai.error

@retry(
    retry=retry_if_exception_type((openai.error.APIError, openai.error.RateLimitError, openai.error.ServiceUnavailableError, openai.error.Timeout)),
    wait=wait_exponential(multiplier=1, min=2, max=30),
    stop=stop_after_attempt(5),
)
def complete_with_retry(prompt: str, **kwargs) -> str:
    return complete(prompt, **kwargs)

Retries on transient errors with exponential backoff. Doesn’t retry on InvalidRequestError (your prompt is wrong; retrying won’t help).

Async pattern

For high-throughput backends:

import asyncio

async def acomplete(prompt: str, max_tokens: int = 300) -> str:
    response = await openai.Completion.acreate(
        model="text-davinci-003",
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=0.0,
    )
    return response.choices[0].text.strip()

async def batch_classify(tickets: list[str]) -> list[TicketClass]:
    tasks = [classify_async(t) for t in tickets]
    return await asyncio.gather(*tasks)

asyncio.gather runs concurrent requests. Watch your rate limit (60 req/min for new accounts).

For batches of 100+, throttle:

import asyncio
from asyncio import Semaphore

async def batch_with_limit(items, fn, concurrency=10):
    sem = Semaphore(concurrency)
    async def wrapped(x):
        async with sem:
            return await fn(x)
    return await asyncio.gather(*[wrapped(i) for i in items])

10 concurrent requests with 1 sec each = 10 req/sec = 600/min. Stays under typical limits.

Token counting with tiktoken

To estimate cost or check prompt size:

import tiktoken

enc = tiktoken.encoding_for_model("text-davinci-003")

def token_count(text: str) -> int:
    return len(enc.encode(text))

def estimate_cost(prompt: str, max_response_tokens: int) -> float:
    total = token_count(prompt) + max_response_tokens
    return total * 0.02 / 1000  # text-davinci-003 pricing

Run before sending. Reject prompts that exceed your budget per request.

Prompt template pattern

Keep prompts in a separate module:

# prompts.py
CLASSIFY_TICKET = """
Classify this support ticket as JSON.

Schema:
{schema}

Ticket: {ticket}

JSON:
""".strip()

SUMMARIZE_THREAD = """
Summarize this email thread in 3 bullet points. Focus on decisions and next steps.

Thread:
{thread}

Summary:
""".strip()

Usage:

from prompts import CLASSIFY_TICKET

prompt = CLASSIFY_TICKET.format(schema=SCHEMA, ticket=text)

Version-control prompts. Treat them like code. A/B test them. Diff them across changes.

What I monitor in production

import time
import logging

def complete_observed(prompt: str, **kwargs):
    start = time.time()
    try:
        result = complete_with_retry(prompt, **kwargs)
        cost = estimate_cost(prompt, kwargs.get("max_tokens", 300))
        logging.info("openai_call", extra={
            "duration_s": time.time() - start,
            "tokens_in": token_count(prompt),
            "tokens_out": token_count(result),
            "cost_usd": cost,
            "status": "success",
        })
        return result
    except Exception as e:
        logging.error("openai_call_failed", extra={
            "duration_s": time.time() - start,
            "error": str(e),
        })
        raise

Push these metrics to Prometheus or similar. Track cost daily; spikes catch bugs (runaway loops).

Common Pitfalls

Hardcoded API key. Pushed to git once = compromised forever. Use env vars.

No retry. Production breaks during OpenAI’s regular outages.

No timeout. A 60-second hung request blocks your handler.

JSON parsed without validation. Model returns malformed JSON sometimes. Pydantic catches.

Token-unaware long prompts. Hit context limit (4097 tokens for text-davinci-003); request fails.

Async without semaphore. Burst → rate-limited → cascading failures.

Logging full prompts to console. May contain PII / secrets. Log structured metadata; redact bodies.

Wrapping Up

Python + openai SDK + tenacity + Pydantic + tiktoken = the working stack for OpenAI integration in early 2023. Tuesday: Node.js variant .

Setup

The basic call

JSON output pattern

Retry with tenacity

Async pattern

Token counting with tiktoken

Prompt template pattern

What I monitor in production

Common Pitfalls

Wrapping Up

Related posts

Prompt Engineering Basics for Engineers

Calling OpenAI from Node.js

Why Every Backend Needs an LLM Integration in 2023

The OpenAI Assistants API in Production, A Cautious Take

Migrating to GPT-4 Turbo, What 128K Context Actually Changes

Error Handling and Retries for LLM APIs

LLM Cost Control and Token Budgets

Streaming Responses from LLM APIs

Let’s Start a Project