Input Validation and the OWASP Top 10
TL;DR — Validate input at boundaries (HTTP, queue consumers, file inputs). Allow-list what you accept; reject anything else. Never concatenate user input into SQL, HTML, OS commands, or HTTP URLs. Parameterized queries; output encoding; allow-listed URLs; argument arrays — one pattern per output context.
After CSRF, the deeper layer. Most “we got hacked” stories trace to bad input handling. The OWASP Top 10 is the standard reference; this post is how the top items map to discipline.
OWASP Top 10 2021 — the relevant ones
- Broken Access Control. Most common.
- Cryptographic Failures. Weak crypto, leaked secrets.
- Injection. SQL, NoSQL, command, LDAP, OS.
- Insecure Design. Architecture-level issues.
- Security Misconfiguration. Defaults left on.
- Vulnerable Components. Dependencies with CVEs.
- Identification and Authentication Failures. Bad password handling, session bugs.
- Software and Data Integrity Failures. Supply chain.
- Security Logging and Monitoring Failures. Won’t notice you’ve been breached.
- SSRF. Server fetches an attacker-controlled URL.
This post focuses on the injection-family (#3 and SSRF) — where input validation lives.
SQL injection — solved by parameterized queries
# BAD
cursor.execute(f"SELECT * FROM users WHERE email = '{email}'")
# GOOD
cursor.execute("SELECT * FROM users WHERE email = %s", (email,))
Parameterized queries pass values separately; the DB driver handles quoting. There’s no way for email = "'; DROP TABLE users; --" to execute as SQL.
Every modern DB library supports parameterization. Use it always.
The few cases that “require” string concat (dynamic table names, dynamic ORDER BY columns): allow-list to a fixed set:
SORT_COLUMNS = {"created_at", "updated_at", "id"}
if sort not in SORT_COLUMNS:
raise ValueError("bad sort column")
query = f"SELECT * FROM orders ORDER BY {sort}"
Never trust user input for an identifier; always validate against a known list.
NoSQL injection — same pattern
MongoDB:
// BAD — user controls operator
db.users.find({ email: req.body.email });
// req.body.email could be { $ne: null } — matches all users
// GOOD — coerce to string
db.users.find({ email: String(req.body.email) });
For arbitrary nested JSON inputs, more discipline needed. Define schemas (Zod, Joi, Pydantic, AJV) and validate before passing to the DB.
Command injection — argument arrays
# BAD
os.system(f"convert {filename} -resize 100x100 thumb.jpg")
# GOOD — argument array, no shell
subprocess.run(["convert", filename, "-resize", "100x100", "thumb.jpg"], check=True)
Argument arrays bypass the shell entirely. No quoting, no escaping, no shell metacharacters. The filename "; rm -rf /; #" is treated as one argument (which probably fails because no file with that name exists).
In Go:
// BAD
exec.Command("sh", "-c", fmt.Sprintf("convert %s -resize 100x100 thumb.jpg", filename))
// GOOD
exec.Command("convert", filename, "-resize", "100x100", "thumb.jpg")
Never use sh -c. Never use string formatting with user input in commands. Always argument array.
XSS — output encoding by context
<!-- BAD -->
<div>Welcome, {{ user.name }}</div>
<!-- If user.name = <script>alert(1)</script>, that script runs -->
Modern templating engines (React, Vue, Angular, Django, Rails) auto-escape HTML by default. The output of {{ user.name }} becomes <script>alert(1)</script>. Safe.
The way to break it:
// BAD — explicit unescaping
<div dangerouslySetInnerHTML={{ __html: user.name }} />
// Or in Django:
{{ user.name|safe }}
safe / dangerouslySetInnerHTML = explicit “I trust this.” Never trust user input for these.
For DOM contexts beyond HTML (<script>, <style>, attribute values), different escaping rules apply. Stick to the framework’s defaults; reach for explicit unescaping only with sanitized HTML (DOMPurify).
SSRF — allow-list URLs
# BAD — fetch whatever URL user provides
def proxy_image(url: str):
return requests.get(url).content
User passes http://169.254.169.254/latest/meta-data/iam (AWS metadata) — returns IAM creds. Or http://localhost:6379 for Redis. SSRF abused to talk to internal services.
Mitigations:
Allow-list domains:
ALLOWED = {"images.example.com", "cdn.example.com"}
def proxy_image(url: str):
parsed = urlparse(url)
if parsed.hostname not in ALLOWED:
raise ValueError("not allowed")
return requests.get(url).content
Block private IPs:
import ipaddress
def is_public_ip(host: str) -> bool:
try:
ip = ipaddress.ip_address(socket.gethostbyname(host))
except Exception:
return False
return not (ip.is_private or ip.is_loopback or ip.is_link_local
or ip.is_multicast or ip.is_reserved)
Both. Belt and suspenders.
Block AWS / GCP / Azure metadata endpoints explicitly: 169.254.169.254, metadata.google.internal, 169.254.169.254/24 (Azure).
Validate at boundaries
The “validate input” discipline:
from pydantic import BaseModel, EmailStr, conint
class CreateUserRequest(BaseModel):
email: EmailStr
age: conint(ge=13, le=130)
role: str
@validator("role")
def role_in_set(cls, v):
if v not in {"admin", "user", "guest"}:
raise ValueError("invalid role")
return v
@app.post("/users")
def create_user(req: CreateUserRequest):
# req is validated; types are correct; constraints checked
...
Validation at the boundary. Inside the handler, types are trustworthy.
Equivalents:
- Go:
validatorlibrary or hand-rolled - JavaScript: Zod, Yup, AJV
- Python: Pydantic
- Rust: serde + custom Validate
Use one. Validate everything coming in.
Allow-list vs deny-list
Always prefer allow-list:
# Deny-list — fragile
def safe_filename(name: str):
if ".." in name or "/" in name or "\\" in name:
raise ValueError("bad filename")
return name
# Allow-list — robust
def safe_filename(name: str):
if not re.match(r"^[a-zA-Z0-9_-]{1,200}\.[a-z]{1,5}$", name):
raise ValueError("bad filename")
return name
Deny-lists miss new attacks. Allow-lists fail open only to things you explicitly approved.
Common Pitfalls
String concatenation into SQL/HTML/commands/URLs. The pattern that creates every injection vulnerability.
Trust based on Content-Type. “It’s JSON; can’t be SQL injection.” JSON body fields still go into SQL queries.
Validation only at the controller; service layer trusts. Internal callers can be bugs. Validate at every boundary.
Schema validation that accepts excess fields. Use strict mode; reject unknown fields. Catches typos and unauthorized fields.
Storing input verbatim then escaping on output. Sometimes right, sometimes leaks. Easier: validate + transform on input.
No length limits. Email field accepts 10MB string; DOS via memory exhaustion. Always bound.
Wrapping Up
Parameterized queries + argument arrays + output encoding + URL allow-lists + schema validation at boundaries. Each layer prevents a different injection. Monday: audit logging.