Your upload validation is probably broken
uplint finds blank PDFs, corrupt images, header-only spreadsheets, and disguised executables that pass your extension checks. One command. Works locally and against S3 buckets. No signup required.
Files your current validation accepts right now
These are real file patterns found in production applications every day. They pass every extension and MIME type check. uplint catches every one.
invoice.pdf
45 KB
A 45KB PDF that looks fine but has zero readable text. Template PDF with only underscores and form lines.
Standard check says
Valid PDF, 45 KB, under size limit
uplint catches
BLANK -- PDF has no readable content
report.xlsx
12 KB
A spreadsheet with headers but no actual data in any cells. Looks like a real file at a glance.
Standard check says
Valid XLSX, 12 KB, correct MIME type
uplint catches
BLANK -- Spreadsheet has no data rows
avatar.png
834 B
A PNG that's a single white pixel scaled up to 1920x1080. Passes every MIME and extension check.
Standard check says
Valid PNG, under size limit
uplint catches
BLANK -- Single-color image detected
backup.zip
22 B
A ZIP that extracts to nothing. Valid archive structure, but completely empty inside.
Standard check says
Valid ZIP archive
uplint catches
BLANK -- ZIP contains zero files
photo.pdf
2.1 MB
A renamed JPEG masquerading as a PDF. The extension says .pdf but the file content is a JPEG image.
Standard check says
Valid file, .pdf extension, 2.1 MB
uplint catches
CORRUPT -- Magic bytes don't match PDF
podcast.mp3
128 KB
An MP3 file with valid headers but zero playable duration. Technically a valid audio file that plays nothing.
Standard check says
Valid MP3, correct MIME type
uplint catches
BLANK -- Audio has zero duration
uplint vs. doing it yourself
Validating just PDFs requires pypdf, blank detection, structure parsing, and virus scanning. Then do it again for images, spreadsheets, audio, video, archives... or use one command.
import pypdf
from PIL import Image
import openpyxl
import pyclamd
import os, csv, json
def validate_file(path):
ext = os.path.splitext(path)[1].lower()
size = os.path.getsize(path)
if size == 0:
return "CORRUPT: empty file"
if ext == ".pdf":
try:
reader = pypdf.PdfReader(path)
text = ""
for page in reader.pages:
text += page.extract_text() or ""
if not any(len(w) >= 3 for w in text.split()):
return "BLANK: no readable content"
except Exception:
return "CORRUPT: invalid PDF"
elif ext in (".png", ".jpg", ".jpeg"):
try:
img = Image.open(path)
img.verify()
img = Image.open(path)
colors = img.getcolors(maxcolors=1)
if colors and len(colors) == 1:
return "BLANK: single-color image"
except Exception:
return "CORRUPT: invalid image"
elif ext == ".xlsx":
try:
wb = openpyxl.load_workbook(path)
has_data = False
for ws in wb.worksheets:
for row in ws.iter_rows(max_row=100):
if any(c.value for c in row):
has_data = True
break
if not has_data:
return "BLANK: no data rows"
except Exception:
return "CORRUPT: invalid spreadsheet"
# ... repeat for CSV, DOCX, JSON, XML,
# ZIP, MP3, MP4, TXT ...
# Virus scanning
try:
cd = pyclamd.ClamdNetworkSocket()
result = cd.scan_file(path)
if result:
return f"VIRUS: {result}"
except Exception:
pass # ClamAV not available
return "PASS"uplint scan ./uploads/pypdf + Pillow + openpyxl + pyclamd
uplint scan
12+ file formats, colored terminal output, JSON export, CI/CD integration, S3 scanning, and watch mode -- maintained and updated so you don't have to be.
import pypdf
from PIL import Image
import openpyxl
import pyclamd
import os, csv, json
def validate_file(path):
ext = os.path.splitext(path)[1].lower()
size = os.path.getsize(path)
if size == 0:
return "CORRUPT: empty file"
if ext == ".pdf":
try:
reader = pypdf.PdfReader(path)
text = ""
for page in reader.pages:
text += page.extract_text() or ""
if not any(len(w) >= 3 for w in text.split()):
return "BLANK: no readable content"
except Exception:
return "CORRUPT: invalid PDF"
elif ext in (".png", ".jpg", ".jpeg"):
try:
img = Image.open(path)
img.verify()
img = Image.open(path)
colors = img.getcolors(maxcolors=1)
if colors and len(colors) == 1:
return "BLANK: single-color image"
except Exception:
return "CORRUPT: invalid image"
elif ext == ".xlsx":
try:
wb = openpyxl.load_workbook(path)
has_data = False
for ws in wb.worksheets:
for row in ws.iter_rows(max_row=100):
if any(c.value for c in row):
has_data = True
break
if not has_data:
return "BLANK: no data rows"
except Exception:
return "CORRUPT: invalid spreadsheet"
# ... repeat for CSV, DOCX, JSON, XML,
# ZIP, MP3, MP4, TXT ...
# Virus scanning
try:
cd = pyclamd.ClamdNetworkSocket()
result = cd.scan_file(path)
if result:
return f"VIRUS: {result}"
except Exception:
pass # ClamAV not available
return "PASS"12+ file formats with deep validation
Not a generic byte check. Each format gets its own integrity parser and blank detection logic — because "is this PDF actually empty?" is a different question than "is this PNG a single pixel?"
Integrity: Magic bytes + structure parsing
Blank: No readable words, no embedded images
Images
Integrity: Magic bytes + Pillow verify
Blank: Single-color image, tiny dimensions
Excel
Integrity: Magic bytes + openpyxl parsing
Blank: No data in any cells (first 100 rows)
CSV
Integrity: UTF-8 decoding + CSV parsing
Blank: No rows with words, or headers-only
Word
Integrity: Magic bytes + python-docx parsing
Blank: No paragraphs with actual words
Text
Integrity: Always valid if non-empty
Blank: No words (3+ consecutive letters)
JSON
Integrity: UTF-8 decoding + JSON parsing
Blank: Empty object, empty array, or null
XML
Integrity: XML parsing
Blank: Only empty root element
ZIP
Integrity: Archive integrity test
Blank: Zero files inside
Audio
Integrity: Magic bytes + mutagen parsing
Blank: Zero duration
Video
Integrity: Magic bytes + mutagen parsing
Blank: Zero duration
Plus ClamAV virus scanning across all file types when available.
Your S3 buckets already have bad files
Files accumulate. Bad ones slip in. uplint scans your buckets directly — fetched into memory, validated, and released. No disk usage, no downloads. The same validation pipeline running against your cloud storage.
Pre-download size filtering
Objects exceeding --max-size are skipped without downloading. No wasted bandwidth.
Pre-download extension filtering
--types pdf,docx filters by S3 key extension before any download occurs.
Prefix scoping
--prefix uploads/2026/01/ scans only a specific "directory" instead of the entire bucket.
Parallel workers
--workers 10 runs 10 concurrent download+scan threads. Increase for many small files.
Built for buckets of any size
Streaming pipeline
Pages of 1,000 objects are fetched and immediately fed to the thread pool. Scanning starts within seconds, even on million-object buckets. Memory stays proportional to --workers, not total object count.
Limit scan
--limit NCap total files scanned. Listing stops server-side once the limit is reached -- no extra API calls. Great for spot-checking recent uploads.
Random sampling
--sample NReservoir sampling picks N random files from the entire bucket. Only N objects held in memory at any time, regardless of bucket size. Ideal for daily health checks.
Resumable scans
--resumeAuto-saves progress every 100 files. If interrupted (Ctrl+C, network failure), resume exactly where you left off. Checkpoint files are atomic and auto-cleaned on completion.
Authentication
Set AWS credentials as env vars. Best for CI/CD pipelines where secrets are injected automatically.
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=wJalr...
export AWS_DEFAULT_REGION=us-east-1
uplint s3-scan --bucket my-uploadsCredentials are never exposed
Never logged, printed, or included in output. Hidden input when prompted interactively. Nothing written to disk.
Works with any S3-compatible storage
uplint s3-scan --bucket my-bucket \
--endpoint-url http://localhost:9000 \
--access-key minioadmin \
--secret-key minioadminUse --endpoint-url to point at any S3-compatible API -- MinIO, DigitalOcean Spaces, Backblaze B2, Cloudflare R2, or a private VPC endpoint.
Fast scan with parallel workers
Scan only PDFs under a specific prefix with 20 concurrent threads. Pre-download filtering means zero wasted bandwidth.
uplint s3-scan --bucket my-uploads \
--prefix invoices/2026/ \
--types pdf \
--workers 20 \
--skip-virusRuns anywhere your pipeline does
GitHub Actions, pre-commit hooks, shell scripts — uplint slots into your existing pipeline. Exit codes make pass/fail decisions automatic. Catch bad files before they ever reach production.
name: File Validation
on:
pull_request:
paths:
- 'assets/**'
- 'uploads/**'
- 'public/**'
jobs:
uplint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install uplint
- run: uplint scan ./assets/ --format json --skip-virusExit codes
All files passed
One or more issues found (blank, corrupt, virus, oversize)
Invalid input (path not found, S3 connection error)
Scan interrupted -- progress saved to checkpoint (s3-scan)
Strict mode
Add --strict to exit with code 1 on any issue, including warnings and errors.
JSON output
Machine-readable output for log ingestion and custom tooling.
uplint runs locally. Uplint runs in production.
You've seen what uplint catches. Now imagine that running on every upload your users make — automatically. Uplint is the same engine, exposed as an API, with everything production demands.
Rate Limiting
Built-in protection against abuse with configurable limits per API key, per tenant, per context.
Audit Trails
Full visibility into every file operation. Track uploads, downloads, and deletions for compliance.
Multi-Tenant Isolation
Logical separation between tenants with independent storage contexts, rules, and access controls.
Security Dashboard
Real-time overview of threats caught, scan volumes, and validation trends across your application.
Start with the CLI. Scale to the API.
The same validation engine you trust locally, now as a production API with multi-tenant isolation. Free tier included — no credit card required.