Uplint
Free & open source · pip install uplint

Your upload validation is probably broken

uplint finds blank PDFs, corrupt images, header-only spreadsheets, and disguised executables that pass your extension checks. One command. Works locally and against S3 buckets. No signup required.

$pip install uplint
Terminal
$
The validation gap

Files your current validation accepts right now

These are real file patterns found in production applications every day. They pass every extension and MIME type check. uplint catches every one.

invoice.pdf

45 KB

BLANK

A 45KB PDF that looks fine but has zero readable text. Template PDF with only underscores and form lines.

Standard check says

Valid PDF, 45 KB, under size limit

uplint catches

BLANK -- PDF has no readable content

report.xlsx

12 KB

BLANK

A spreadsheet with headers but no actual data in any cells. Looks like a real file at a glance.

Standard check says

Valid XLSX, 12 KB, correct MIME type

uplint catches

BLANK -- Spreadsheet has no data rows

avatar.png

834 B

BLANK

A PNG that's a single white pixel scaled up to 1920x1080. Passes every MIME and extension check.

Standard check says

Valid PNG, under size limit

uplint catches

BLANK -- Single-color image detected

backup.zip

22 B

BLANK

A ZIP that extracts to nothing. Valid archive structure, but completely empty inside.

Standard check says

Valid ZIP archive

uplint catches

BLANK -- ZIP contains zero files

photo.pdf

2.1 MB

CORRUPT

A renamed JPEG masquerading as a PDF. The extension says .pdf but the file content is a JPEG image.

Standard check says

Valid file, .pdf extension, 2.1 MB

uplint catches

CORRUPT -- Magic bytes don't match PDF

podcast.mp3

128 KB

BLANK

An MP3 file with valid headers but zero playable duration. Technically a valid audio file that plays nothing.

Standard check says

Valid MP3, correct MIME type

uplint catches

BLANK -- Audio has zero duration

Why not just build it yourself?

uplint vs. doing it yourself

Validating just PDFs requires pypdf, blank detection, structure parsing, and virus scanning. Then do it again for images, spreadsheets, audio, video, archives... or use one command.

validate.py -- do it yourself
62 lines
import pypdf
from PIL import Image
import openpyxl
import pyclamd
import os, csv, json

def validate_file(path):
    ext = os.path.splitext(path)[1].lower()
    size = os.path.getsize(path)

    if size == 0:
        return "CORRUPT: empty file"

    if ext == ".pdf":
        try:
            reader = pypdf.PdfReader(path)
            text = ""
            for page in reader.pages:
                text += page.extract_text() or ""
            if not any(len(w) >= 3 for w in text.split()):
                return "BLANK: no readable content"
        except Exception:
            return "CORRUPT: invalid PDF"

    elif ext in (".png", ".jpg", ".jpeg"):
        try:
            img = Image.open(path)
            img.verify()
            img = Image.open(path)
            colors = img.getcolors(maxcolors=1)
            if colors and len(colors) == 1:
                return "BLANK: single-color image"
        except Exception:
            return "CORRUPT: invalid image"

    elif ext == ".xlsx":
        try:
            wb = openpyxl.load_workbook(path)
            has_data = False
            for ws in wb.worksheets:
                for row in ws.iter_rows(max_row=100):
                    if any(c.value for c in row):
                        has_data = True
                        break
            if not has_data:
                return "BLANK: no data rows"
        except Exception:
            return "CORRUPT: invalid spreadsheet"

    # ... repeat for CSV, DOCX, JSON, XML,
    #     ZIP, MP3, MP4, TXT ...

    # Virus scanning
    try:
        cd = pyclamd.ClamdNetworkSocket()
        result = cd.scan_file(path)
        if result:
            return f"VIRUS: {result}"
    except Exception:
        pass  # ClamAV not available

    return "PASS"
Format-specific intelligence

12+ file formats with deep validation

Not a generic byte check. Each format gets its own integrity parser and blank detection logic — because "is this PDF actually empty?" is a different question than "is this PNG a single pixel?"

PDF

.pdf

Integrity: Magic bytes + structure parsing

Blank: No readable words, no embedded images

Images

.png.jpg.jpeg.gif.webp

Integrity: Magic bytes + Pillow verify

Blank: Single-color image, tiny dimensions

Excel

.xlsx

Integrity: Magic bytes + openpyxl parsing

Blank: No data in any cells (first 100 rows)

CSV

.csv

Integrity: UTF-8 decoding + CSV parsing

Blank: No rows with words, or headers-only

Word

.docx

Integrity: Magic bytes + python-docx parsing

Blank: No paragraphs with actual words

Text

.txt

Integrity: Always valid if non-empty

Blank: No words (3+ consecutive letters)

JSON

.json

Integrity: UTF-8 decoding + JSON parsing

Blank: Empty object, empty array, or null

XML

.xml

Integrity: XML parsing

Blank: Only empty root element

ZIP

.zip

Integrity: Archive integrity test

Blank: Zero files inside

Audio

.mp3

Integrity: Magic bytes + mutagen parsing

Blank: Zero duration

Video

.mp4

Integrity: Magic bytes + mutagen parsing

Blank: Zero duration

Plus ClamAV virus scanning across all file types when available.

Cloud-native scanning

Your S3 buckets already have bad files

Files accumulate. Bad ones slip in. uplint scans your buckets directly — fetched into memory, validated, and released. No disk usage, no downloads. The same validation pipeline running against your cloud storage.

Terminal
$ pip install uplint[s3]
Successfully installed uplint-0.1.0 boto3-1.35.0
$ uplint s3-scan --bucket my-uploads --prefix uploads/ --profile myaws
Scanning s3://my-uploads/uploads/ (142 objects)...
StatusKeySizeDetail
PASSuploads/report.pdf824 KB
BLANKuploads/data.xlsx12 KBNo data rows
CORRUPTuploads/photo.pdf2.1 MBMagic bytes mismatch
PASSuploads/contract.docx340 KB
2 issues found in 142 objects (scan time: 8.4s, 10 workers)

Pre-download size filtering

Objects exceeding --max-size are skipped without downloading. No wasted bandwidth.

Pre-download extension filtering

--types pdf,docx filters by S3 key extension before any download occurs.

Prefix scoping

--prefix uploads/2026/01/ scans only a specific "directory" instead of the entire bucket.

Parallel workers

--workers 10 runs 10 concurrent download+scan threads. Increase for many small files.

300K+ files? No problem.

Built for buckets of any size

Streaming pipeline

Pages of 1,000 objects are fetched and immediately fed to the thread pool. Scanning starts within seconds, even on million-object buckets. Memory stays proportional to --workers, not total object count.

uplint s3-scan --bucket huge-bucket --workers 20

Limit scan

--limit N

Cap total files scanned. Listing stops server-side once the limit is reached -- no extra API calls. Great for spot-checking recent uploads.

uplint s3-scan --bucket my-uploads --prefix uploads/2026/02/ --limit 1000

Random sampling

--sample N

Reservoir sampling picks N random files from the entire bucket. Only N objects held in memory at any time, regardless of bucket size. Ideal for daily health checks.

uplint s3-scan --bucket production-uploads --sample 500

Resumable scans

--resume

Auto-saves progress every 100 files. If interrupted (Ctrl+C, network failure), resume exactly where you left off. Checkpoint files are atomic and auto-cleaned on completion.

uplint s3-scan --bucket million-files --workers 20 --resume

Authentication

Best for: CI/CD

Set AWS credentials as env vars. Best for CI/CD pipelines where secrets are injected automatically.

export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=wJalr...
export AWS_DEFAULT_REGION=us-east-1

uplint s3-scan --bucket my-uploads

Credentials are never exposed

Never logged, printed, or included in output. Hidden input when prompted interactively. Nothing written to disk.

Works with any S3-compatible storage

minio-example
uplint s3-scan --bucket my-bucket \
  --endpoint-url http://localhost:9000 \
  --access-key minioadmin \
  --secret-key minioadmin

Use --endpoint-url to point at any S3-compatible API -- MinIO, DigitalOcean Spaces, Backblaze B2, Cloudflare R2, or a private VPC endpoint.

Fast scan with parallel workers

Scan only PDFs under a specific prefix with 20 concurrent threads. Pre-download filtering means zero wasted bandwidth.

uplint s3-scan --bucket my-uploads \
  --prefix invoices/2026/ \
  --types pdf \
  --workers 20 \
  --skip-virus
Stop bad files before they ship

Runs anywhere your pipeline does

GitHub Actions, pre-commit hooks, shell scripts — uplint slots into your existing pipeline. Exit codes make pass/fail decisions automatic. Catch bad files before they ever reach production.

.github/workflows/validate.yml
name: File Validation
on:
  pull_request:
    paths:
      - 'assets/**'
      - 'uploads/**'
      - 'public/**'

jobs:
  uplint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install uplint
      - run: uplint scan ./assets/ --format json --skip-virus

Exit codes

0

All files passed

1

One or more issues found (blank, corrupt, virus, oversize)

2

Invalid input (path not found, S3 connection error)

3

Scan interrupted -- progress saved to checkpoint (s3-scan)

Strict mode

Add --strict to exit with code 1 on any issue, including warnings and errors.

uplint scan ./assets/ --strict

JSON output

Machine-readable output for log ingestion and custom tooling.

uplint scan ./ --format json

uplint runs locally. Uplint runs in production.

You've seen what uplint catches. Now imagine that running on every upload your users make — automatically. Uplint is the same engine, exposed as an API, with everything production demands.

Rate Limiting

Built-in protection against abuse with configurable limits per API key, per tenant, per context.

Audit Trails

Full visibility into every file operation. Track uploads, downloads, and deletions for compliance.

Multi-Tenant Isolation

Logical separation between tenants with independent storage contexts, rules, and access controls.

Security Dashboard

Real-time overview of threats caught, scan volumes, and validation trends across your application.

Start with the CLI. Scale to the API.

The same validation engine you trust locally, now as a production API with multi-tenant isolation. Free tier included — no credit card required.