Input Validation 9 min read

Input Validation Beyond Sanitization

Sanitisation is necessary but not sufficient. A layered approach to input validation that accounts for encoding, context, and business logic.

Laura Bell Main

Founder & CEO, SafeStack

7 Jan 2025

Input validation is one of those security topics where teams feel like they've handled it — they have input sanitization at their API boundary, their web framework sanitizes form inputs, they're using parameterized queries for database access. The fundamentals are covered. And then an attack comes in through a vector that bypassed the edge sanitization: a second-order SQL injection that executes against stored data, an XSS payload that survives HTML encoding because it's being rendered in a JavaScript string context, a path traversal that bypasses a URL decoding check through double-encoding.

Defense in depth for input validation means treating every processing layer as a potential injection point, not just the application's external API surface.

The difference between sanitization, validation, and encoding

These three terms are often used interchangeably and they shouldn't be. Understanding the distinction changes how you architect input handling.

Validation checks that input meets the expected structure and constraints: is this a valid email address format? Is this integer within the expected range? Is this file extension in the allowed list? Validation rejects input that doesn't meet requirements. It doesn't modify the input.

Sanitization modifies input to remove or neutralize potentially dangerous content: stripping HTML tags, removing script elements, normalizing Unicode. Sanitization transforms input to make it safe for a specific context. The problem with relying primarily on sanitization is that it's context-dependent — what's safe for a text file is dangerous in SQL, what's safe in HTML body is dangerous in an HTML attribute, what's safe in one HTML encoding context may be dangerous in another.

Encoding transforms data for safe use in a specific output context: HTML encoding for HTML body, JavaScript string encoding for JavaScript context, URL encoding for URL parameters, parameterized queries for SQL context. Encoding is the preferred approach for injection prevention because it's applied at the point of use, in the specific context where injection would occur, rather than at the point of receipt.

The correct architecture is: validate at the boundary (reject what shouldn't enter), store and process data in its original form where possible, and encode at the point of output for each specific context. Sanitizing at the boundary and then trusting sanitized data across all downstream contexts is where second-order vulnerabilities appear.

Second-order injection: why stored data isn't safe data

Second-order SQL injection is a classic example. Data enters the application, is sanitized (or assumed safe because it's from "our own database"), is stored, and then used in a later operation that constructs a SQL query. If that later query uses string concatenation rather than parameterized queries — on the assumption that data from the database is trusted — the stored payload executes in the second query context.

The fix isn't to re-sanitize data when reading from the database. It's to use parameterized queries everywhere data is used in SQL construction — not just at the external API boundary, but in every query including those that operate on stored data.

The same pattern applies to XSS. A username is collected from a user, validated as alphanumeric at registration, stored in the database. Later, an admin page renders usernames in a JavaScript block using string interpolation rather than proper encoding. If a user registers with a username that was valid alphanumeric by the registration validator but contains characters dangerous in JavaScript string context, the admin page has an XSS vulnerability despite the input validation at registration.

Context-aware output encoding

The most important concept in preventing injection is encoding at the point of use, for the specific context. OWASP's XSS Prevention Cheat Sheet identifies five HTML encoding contexts that each require different encoding approaches:

HTML body context: encode &, <, >, ", '. Standard htmlspecialchars() in PHP or equivalent handles this.

HTML attribute context: same characters, but the encoded value must also be quoted. Unquoted attribute values can be escaped with spaces or other characters that don't need HTML encoding.

JavaScript string context: within a <script> block or event handler, HTML encoding is insufficient. A value that has been HTML-encoded may still execute JavaScript if it's placed inside a JavaScript string literal. JavaScript string encoding — escaping special JavaScript string characters — is required.

URL context: percent-encoding for values placed in URL parameters. Double-encoding attacks target URL parsers that decode values before validation — validate after decoding, not before.

CSS context: user data in CSS values can be used to inject CSS expressions in older browsers and to create CSS-based data exfiltration in modern ones. Avoid placing user-controlled data in CSS contexts where possible.

File upload validation that actually holds

File uploads are an input validation challenge that deserves specific attention. The common approach — checking the file extension and MIME type — is insufficient. MIME type can be spoofed in the HTTP request. File extension can be manipulated (shell.php.jpg may be treated as a PHP file on some web servers). The file content itself is the only reliable indicator of file type.

For image uploads: validate that the file content is a valid image by attempting to parse it as an image (using a library like Pillow/PIL in Python, or ImageMagick). Don't render or process files that fail image parsing. Serve uploaded files from a different domain than your application — a stored XSS payload in an SVG file served from your main domain executes in your application's origin. Serving it from a separate domain or through a sanitization proxy eliminates that risk.

We're not saying file upload validation needs to be complex. We're saying that checking the extension and MIME type without validating the actual file content provides an illusion of validation that doesn't hold under targeted attack.

Where validation belongs in a layered architecture

A common question: if validation is defense in depth, how much validation is too much? Is it right to validate at the HTTP layer, at the API handler, at the service layer, and at the database layer?

The practical answer depends on the risk profile of the data. For public-facing input that triggers high-risk operations — authentication, financial transactions, data export — multiple layers of validation provide meaningful defense in depth. For internal service-to-service calls within your own network perimeter, exhaustive re-validation at every hop adds latency without proportional security benefit.

The boundary that matters most is the trust boundary: external input from users or systems you don't control should be validated comprehensively. Data crossing a trust boundary between components with different trust levels — a user-facing API calling an internal privileged service — should validate input at the service boundary. Data flowing within a single trust zone with equivalent privilege levels doesn't require the same level of re-validation.

Validation is not a problem you solve once at setup and never revisit. As your application's data flows evolve — new integrations, new data export features, new rendering contexts — the set of places where unencoded or unvalidated data could cause harm changes. Regular security review of new code that introduces new output contexts is how defense-in-depth validation stays effective over time.

The difference between sanitization, validation, and encoding

Second-order injection: why stored data isn't safe data

Context-aware output encoding

File upload validation that actually holds

Where validation belongs in a layered architecture

Related articles

The Secure Code Review Checklist

API Security Testing: A Practitioner's Guide

What's Changed in the OWASP Top 10 (2025)