Why File Extension Checks Are Not Enough

Every upload pipeline starts the same way. A file arrives, your code checks the extension, maybe enforces a size limit, and ships it to S3. Done. Three lines of validation and you move on to the next feature.

if (!file.name.match(/\.(pdf|jpg|png|docx)$/)) {
  throw new Error("Invalid file type");
}
await uploadToS3(file);

This pattern exists in almost every web application. It's also dangerously incomplete.

The extension is a label, not a fact

A file extension is metadata chosen by the sender. It has no binding relationship with the actual content inside the file. Renaming malware.exe to report.pdf takes one keystroke. Your extension check will pass it without hesitation.

This isn't a theoretical concern. It's a pattern we see daily across the files processed by Uplint's validation pipeline:

Executables disguised as PDFs — binary payloads renamed to bypass upload filters
Polyglot files — files that are valid in multiple formats simultaneously, exploiting parsers downstream
MIME type spoofing — crafted headers that make a file appear to be something it isn't

Extension checks catch typos. They don't catch intent.

The blank file problem

Security threats get the headlines, but the most common real-world issue is far more mundane: blank files.

These are files that pass every check — correct extension, valid MIME type, reasonable size — but contain no meaningful content:

File	What it looks like	What's actually inside
`invoice_q4.pdf`	2.4 MB PDF	Zero readable words — blank pages
`user_data.xlsx`	340 KB spreadsheet	Column headers, zero data rows
`avatar.png`	48 KB image	Single solid color (#ffffff)
`notes.txt`	12 KB text file	Whitespace characters only

Every one of these files will pass file.name.endsWith('.pdf'). Every one will pass a MIME type check. Every one will pass a size limit. And every one is worthless.

In industries like healthcare, legal, and finance, blank files aren't just a UX problem. They're a compliance risk. When a patient uploads a "medical record" that's actually a blank PDF, and your system accepts it without question, you've created a gap in your audit trail.

What real validation looks like

True file validation requires understanding what's inside the file, not just what it claims to be on the outside. This means:

1. Content analysis — Reading into the actual bytes. Does this PDF contain text? Does this spreadsheet have data rows? Does this image have meaningful visual content?

2. Structural validation — Verifying the internal structure matches the claimed format. A valid PDF has a specific binary structure. A JPEG has defined markers. Files that don't match are corrupt or disguised.

3. Threat detection — Scanning for known malware signatures, embedded scripts, and exploit patterns. This needs to happen on every upload, not as an afterthought.

4. Contextual rules — Profile photos have different requirements than legal documents. A one-size-fits-all check misses the nuance. Patient records might require PDFs with at least 50 readable words. Profile images might need to be non-uniform and under 5MB.

The cost of false trust

When your validation says "pass" but the file is blank, corrupt, or malicious, you've created false trust. Your system now treats that file as legitimate data. Downstream processes depend on it. Users reference it. Auditors expect it to contain what it claims.

The cost compounds:

Support tickets when users discover blank documents they uploaded months ago
Compliance gaps when audit logs show files were accepted but contain nothing
Security incidents when a disguised executable reaches a system that trusts the file type
Data quality erosion as blank files accumulate in your storage, inflating metrics and confusing analysis

Building a trust layer

The solution isn't to write more validation code. It's to recognize that file validation is an infrastructure concern — like authentication or encryption — that deserves a dedicated layer.

At Uplint, we built this layer as a single API call:

const result = await uplint.validate(file, {
  context: "patient-records",
  scan: true,
  detectBlanks: true,
});

if (!result.trusted) {
  // File was rejected — see result.reasons for details
}

One call triggers the full pipeline: structural validation, content analysis, blank detection, threat scanning, and audit logging. Your code doesn't need to know how to detect a blank PDF or identify a disguised executable. It just needs to ask: "Can I trust this file?"

That's the shift. From checking labels to understanding content. From trusting extensions to verifying substance.

Uplint is the trust layer for incoming data. Start with the CLI to scan your files locally, or integrate the API to validate uploads in production. Get started free →