background-shape
Input Validation and the OWASP Top 10
November 25, 2022 · 5 min read · by Muhammad Amal programming

TL;DR — Validate input at boundaries (HTTP, queue consumers, file inputs). Allow-list what you accept; reject anything else. Never concatenate user input into SQL, HTML, OS commands, or HTTP URLs. Parameterized queries; output encoding; allow-listed URLs; argument arrays — one pattern per output context.

After CSRF, the deeper layer. Most “we got hacked” stories trace to bad input handling. The OWASP Top 10 is the standard reference; this post is how the top items map to discipline.

OWASP Top 10 2021 — the relevant ones

  1. Broken Access Control. Most common.
  2. Cryptographic Failures. Weak crypto, leaked secrets.
  3. Injection. SQL, NoSQL, command, LDAP, OS.
  4. Insecure Design. Architecture-level issues.
  5. Security Misconfiguration. Defaults left on.
  6. Vulnerable Components. Dependencies with CVEs.
  7. Identification and Authentication Failures. Bad password handling, session bugs.
  8. Software and Data Integrity Failures. Supply chain.
  9. Security Logging and Monitoring Failures. Won’t notice you’ve been breached.
  10. SSRF. Server fetches an attacker-controlled URL.

This post focuses on the injection-family (#3 and SSRF) — where input validation lives.

SQL injection — solved by parameterized queries

# BAD
cursor.execute(f"SELECT * FROM users WHERE email = '{email}'")

# GOOD
cursor.execute("SELECT * FROM users WHERE email = %s", (email,))

Parameterized queries pass values separately; the DB driver handles quoting. There’s no way for email = "'; DROP TABLE users; --" to execute as SQL.

Every modern DB library supports parameterization. Use it always.

The few cases that “require” string concat (dynamic table names, dynamic ORDER BY columns): allow-list to a fixed set:

SORT_COLUMNS = {"created_at", "updated_at", "id"}
if sort not in SORT_COLUMNS:
    raise ValueError("bad sort column")
query = f"SELECT * FROM orders ORDER BY {sort}"

Never trust user input for an identifier; always validate against a known list.

NoSQL injection — same pattern

MongoDB:

// BAD — user controls operator
db.users.find({ email: req.body.email });
// req.body.email could be { $ne: null } — matches all users

// GOOD — coerce to string
db.users.find({ email: String(req.body.email) });

For arbitrary nested JSON inputs, more discipline needed. Define schemas (Zod, Joi, Pydantic, AJV) and validate before passing to the DB.

Command injection — argument arrays

# BAD
os.system(f"convert {filename} -resize 100x100 thumb.jpg")

# GOOD — argument array, no shell
subprocess.run(["convert", filename, "-resize", "100x100", "thumb.jpg"], check=True)

Argument arrays bypass the shell entirely. No quoting, no escaping, no shell metacharacters. The filename "; rm -rf /; #" is treated as one argument (which probably fails because no file with that name exists).

In Go:

// BAD
exec.Command("sh", "-c", fmt.Sprintf("convert %s -resize 100x100 thumb.jpg", filename))

// GOOD
exec.Command("convert", filename, "-resize", "100x100", "thumb.jpg")

Never use sh -c. Never use string formatting with user input in commands. Always argument array.

XSS — output encoding by context

<!-- BAD -->
<div>Welcome, {{ user.name }}</div>

<!-- If user.name = <script>alert(1)</script>, that script runs -->

Modern templating engines (React, Vue, Angular, Django, Rails) auto-escape HTML by default. The output of {{ user.name }} becomes &lt;script&gt;alert(1)&lt;/script&gt;. Safe.

The way to break it:

// BAD — explicit unescaping
<div dangerouslySetInnerHTML={{ __html: user.name }} />

// Or in Django:
{{ user.name|safe }}

safe / dangerouslySetInnerHTML = explicit “I trust this.” Never trust user input for these.

For DOM contexts beyond HTML (<script>, <style>, attribute values), different escaping rules apply. Stick to the framework’s defaults; reach for explicit unescaping only with sanitized HTML (DOMPurify).

SSRF — allow-list URLs

# BAD — fetch whatever URL user provides
def proxy_image(url: str):
    return requests.get(url).content

User passes http://169.254.169.254/latest/meta-data/iam (AWS metadata) — returns IAM creds. Or http://localhost:6379 for Redis. SSRF abused to talk to internal services.

Mitigations:

Allow-list domains:

ALLOWED = {"images.example.com", "cdn.example.com"}

def proxy_image(url: str):
    parsed = urlparse(url)
    if parsed.hostname not in ALLOWED:
        raise ValueError("not allowed")
    return requests.get(url).content

Block private IPs:

import ipaddress

def is_public_ip(host: str) -> bool:
    try:
        ip = ipaddress.ip_address(socket.gethostbyname(host))
    except Exception:
        return False
    return not (ip.is_private or ip.is_loopback or ip.is_link_local
                or ip.is_multicast or ip.is_reserved)

Both. Belt and suspenders.

Block AWS / GCP / Azure metadata endpoints explicitly: 169.254.169.254, metadata.google.internal, 169.254.169.254/24 (Azure).

Validate at boundaries

The “validate input” discipline:

from pydantic import BaseModel, EmailStr, conint

class CreateUserRequest(BaseModel):
    email: EmailStr
    age: conint(ge=13, le=130)
    role: str

    @validator("role")
    def role_in_set(cls, v):
        if v not in {"admin", "user", "guest"}:
            raise ValueError("invalid role")
        return v

@app.post("/users")
def create_user(req: CreateUserRequest):
    # req is validated; types are correct; constraints checked
    ...

Validation at the boundary. Inside the handler, types are trustworthy.

Equivalents:

  • Go: validator library or hand-rolled
  • JavaScript: Zod, Yup, AJV
  • Python: Pydantic
  • Rust: serde + custom Validate

Use one. Validate everything coming in.

Allow-list vs deny-list

Always prefer allow-list:

# Deny-list — fragile
def safe_filename(name: str):
    if ".." in name or "/" in name or "\\" in name:
        raise ValueError("bad filename")
    return name

# Allow-list — robust
def safe_filename(name: str):
    if not re.match(r"^[a-zA-Z0-9_-]{1,200}\.[a-z]{1,5}$", name):
        raise ValueError("bad filename")
    return name

Deny-lists miss new attacks. Allow-lists fail open only to things you explicitly approved.

Common Pitfalls

String concatenation into SQL/HTML/commands/URLs. The pattern that creates every injection vulnerability.

Trust based on Content-Type. “It’s JSON; can’t be SQL injection.” JSON body fields still go into SQL queries.

Validation only at the controller; service layer trusts. Internal callers can be bugs. Validate at every boundary.

Schema validation that accepts excess fields. Use strict mode; reject unknown fields. Catches typos and unauthorized fields.

Storing input verbatim then escaping on output. Sometimes right, sometimes leaks. Easier: validate + transform on input.

No length limits. Email field accepts 10MB string; DOS via memory exhaustion. Always bound.

Wrapping Up

Parameterized queries + argument arrays + output encoding + URL allow-lists + schema validation at boundaries. Each layer prevents a different injection. Monday: audit logging.