Is the uplint CLI free?

Yes, completely free and always will be. pip install uplint and start scanning immediately.

What file types does uplint detect as blank?

Uplint detects zero-word PDFs, spreadsheets with headers but no data, single-color images, and whitespace-only text files.

Can I scan files in S3 buckets?

Yes, uplint supports scanning S3, MinIO, and DigitalOcean Spaces directly from the CLI.

Does the CLI require an API key?

No. The CLI works locally without signup or API keys. The platform API key is only needed for cloud features.

Can I use uplint in CI/CD pipelines?

Yes, uplint integrates with GitHub Actions, GitLab CI, and any CI system that supports Python.

Free & open source · pip install uplint

Your upload validation is probably broken

uplint finds blank PDFs, corrupt images, header-only spreadsheets, and disguised executables that pass your extension checks. One command. Works locally and against S3 buckets. No signup required.

$pip install uplint

Terminal

Try it on your own files

Get up and running in three commands. No signup, no API key, no configuration.

Install uplint

Terminal

$ pip install uplint

Collecting uplint

Downloading uplint-0.1.0-py3-none-any.whl

Installing collected packages: uplint

Successfully installed uplint-0.1.0

Run scan

Terminal

$ uplint scan ./uploads

Scanning ./uploads...

Found 12 files

Status File Size Detail

✓ PASS report.pdf 2.1MB Valid PDF

✗ BLANK blank_sheet.xlsx 48KB Blank spreadsheet detected

✗ BLANK empty_image.png 156B Blank image

✗ VIRUS infected.exe 1.2MB Malware signature detected

✓ PASS data.csv 512KB Valid CSV

See results

Terminal

$ uplint scan ./uploads --format json

{

"summary": {

"total": 12,

"passed": 8,

"blank": 2,

"virus": 1,

"corrupt": 1

"files": [

{

"name": "blank_sheet.xlsx",

"status": "BLANK",

"reason": "Blank spreadsheet detected"

}

]

}

Want this on every upload in production?

The validation gap

Files your current validation accepts right now

These are real file patterns found in production applications every day. They pass every extension and MIME type check. uplint catches every one.

invoice.pdf

45 KB

BLANK

A 45KB PDF that looks fine but has zero readable text. Template PDF with only underscores and form lines.

Standard check says

Valid PDF, 45 KB, under size limit

uplint catches

BLANK -- PDF has no readable content

report.xlsx

12 KB

BLANK

A spreadsheet with headers but no actual data in any cells. Looks like a real file at a glance.

Standard check says

Valid XLSX, 12 KB, correct MIME type

uplint catches

BLANK -- Spreadsheet has no data rows

avatar.png

834 B

BLANK

A PNG that's a single white pixel scaled up to 1920x1080. Passes every MIME and extension check.

Standard check says

Valid PNG, under size limit

uplint catches

BLANK -- Single-color image detected

backup.zip

22 B

BLANK

A ZIP that extracts to nothing. Valid archive structure, but completely empty inside.

Standard check says

Valid ZIP archive

uplint catches

BLANK -- ZIP contains zero files

photo.pdf

2.1 MB

CORRUPT

A renamed JPEG masquerading as a PDF. The extension says .pdf but the file content is a JPEG image.

Standard check says

Valid file, .pdf extension, 2.1 MB

uplint catches

CORRUPT -- Magic bytes don't match PDF

podcast.mp3

128 KB

BLANK

An MP3 file with valid headers but zero playable duration. Technically a valid audio file that plays nothing.

Standard check says

Valid MP3, correct MIME type

uplint catches

BLANK -- Audio has zero duration

Why not just build it yourself?

uplint vs. doing it yourself

Validating just PDFs requires pypdf, blank detection, structure parsing, and virus scanning. Then do it again for images, spreadsheets, audio, video, archives... or use one command.

validate.py -- do it yourself

62 lines

import pypdf
from PIL import Image
import openpyxl
import pyclamd
import os, csv, json

def validate_file(path):
    ext = os.path.splitext(path)[1].lower()
    size = os.path.getsize(path)

    if size == 0:
        return "CORRUPT: empty file"

    if ext == ".pdf":
        try:
            reader = pypdf.PdfReader(path)
            text = ""
            for page in reader.pages:
                text += page.extract_text() or ""
            if not any(len(w) >= 3 for w in text.split()):
                return "BLANK: no readable content"
        except Exception:
            return "CORRUPT: invalid PDF"

    elif ext in (".png", ".jpg", ".jpeg"):
        try:
            img = Image.open(path)
            img.verify()
            img = Image.open(path)
            colors = img.getcolors(maxcolors=1)
            if colors and len(colors) == 1:
                return "BLANK: single-color image"
        except Exception:
            return "CORRUPT: invalid image"

    elif ext == ".xlsx":
        try:
            wb = openpyxl.load_workbook(path)
            has_data = False
            for ws in wb.worksheets:
                for row in ws.iter_rows(max_row=100):
                    if any(c.value for c in row):
                        has_data = True
                        break
            if not has_data:
                return "BLANK: no data rows"
        except Exception:
            return "CORRUPT: invalid spreadsheet"

    # ... repeat for CSV, DOCX, JSON, XML,
    #     ZIP, MP3, MP4, TXT ...

    # Virus scanning
    try:
        cd = pyclamd.ClamdNetworkSocket()
        result = cd.scan_file(path)
        if result:
            return f"VIRUS: {result}"
    except Exception:
        pass  # ClamAV not available

    return "PASS"

Terminal -- uplint

1 line

uplint scan ./uploads/

lines of Python
pypdf + Pillow + openpyxl + pyclamd

command
uplint scan

12+ file formats, colored terminal output, JSON export, CI/CD integration, S3 scanning, and watch mode -- maintained and updated so you don't have to be.

validate.py -- do it yourself

62 lines

import pypdf
from PIL import Image
import openpyxl
import pyclamd
import os, csv, json

def validate_file(path):
    ext = os.path.splitext(path)[1].lower()
    size = os.path.getsize(path)

    if size == 0:
        return "CORRUPT: empty file"

    if ext == ".pdf":
        try:
            reader = pypdf.PdfReader(path)
            text = ""
            for page in reader.pages:
                text += page.extract_text() or ""
            if not any(len(w) >= 3 for w in text.split()):
                return "BLANK: no readable content"
        except Exception:
            return "CORRUPT: invalid PDF"

    elif ext in (".png", ".jpg", ".jpeg"):
        try:
            img = Image.open(path)
            img.verify()
            img = Image.open(path)
            colors = img.getcolors(maxcolors=1)
            if colors and len(colors) == 1:
                return "BLANK: single-color image"
        except Exception:
            return "CORRUPT: invalid image"

    elif ext == ".xlsx":
        try:
            wb = openpyxl.load_workbook(path)
            has_data = False
            for ws in wb.worksheets:
                for row in ws.iter_rows(max_row=100):
                    if any(c.value for c in row):
                        has_data = True
                        break
            if not has_data:
                return "BLANK: no data rows"
        except Exception:
            return "CORRUPT: invalid spreadsheet"

    # ... repeat for CSV, DOCX, JSON, XML,
    #     ZIP, MP3, MP4, TXT ...

    # Virus scanning
    try:
        cd = pyclamd.ClamdNetworkSocket()
        result = cd.scan_file(path)
        if result:
            return f"VIRUS: {result}"
    except Exception:
        pass  # ClamAV not available

    return "PASS"

Format-specific intelligence

12+ file formats with deep validation

Not a generic byte check. Each format gets its own integrity parser and blank detection logic — because "is this PDF actually empty?" is a different question than "is this PNG a single pixel?"

PDF

.pdf

Integrity: Magic bytes + structure parsing

Blank: No readable words, no embedded images

Images

.png.jpg.jpeg.gif.webp

Integrity: Magic bytes + Pillow verify

Blank: Single-color image, tiny dimensions

Excel

.xlsx

Integrity: Magic bytes + openpyxl parsing

Blank: No data in any cells (first 100 rows)

CSV

.csv

Integrity: UTF-8 decoding + CSV parsing

Blank: No rows with words, or headers-only

Word

.docx

Integrity: Magic bytes + python-docx parsing

Blank: No paragraphs with actual words

Text

.txt

Integrity: Always valid if non-empty

Blank: No words (3+ consecutive letters)

JSON

.json

Integrity: UTF-8 decoding + JSON parsing

Blank: Empty object, empty array, or null

XML

.xml

Integrity: XML parsing

Blank: Only empty root element

ZIP

.zip

Integrity: Archive integrity test

Blank: Zero files inside

Audio

.mp3

Integrity: Magic bytes + mutagen parsing

Blank: Zero duration

Video

.mp4

Integrity: Magic bytes + mutagen parsing

Blank: Zero duration

Plus ClamAV virus scanning across all file types when available.

Cloud-native scanning

Your S3 buckets already have bad files

Files accumulate. Bad ones slip in. uplint scans your buckets directly — fetched into memory, validated, and released. No disk usage, no downloads. The same validation pipeline running against your cloud storage.

Terminal

$ pip install uplint[s3]

Successfully installed uplint-0.1.0 boto3-1.35.0

$ uplint s3-scan --bucket my-uploads --prefix uploads/ --profile myaws

Scanning s3://my-uploads/uploads/ (142 objects)...

StatusKeySizeDetail

✓ PASSuploads/report.pdf824 KB

✗ BLANKuploads/data.xlsx12 KBNo data rows

✗ CORRUPTuploads/photo.pdf2.1 MBMagic bytes mismatch

✓ PASSuploads/contract.docx340 KB

2 issues found in 142 objects (scan time: 8.4s, 10 workers)

Pre-download size filtering

Objects exceeding --max-size are skipped without downloading. No wasted bandwidth.

Pre-download extension filtering

--types pdf,docx filters by S3 key extension before any download occurs.

Prefix scoping

--prefix uploads/2026/01/ scans only a specific "directory" instead of the entire bucket.

Parallel workers

--workers 10 runs 10 concurrent download+scan threads. Increase for many small files.

300K+ files? No problem.

Built for buckets of any size

Streaming pipeline

Pages of 1,000 objects are fetched and immediately fed to the thread pool. Scanning starts within seconds, even on million-object buckets. Memory stays proportional to --workers, not total object count.

uplint s3-scan --bucket huge-bucket --workers 20

Limit scan

--limit N

Cap total files scanned. Listing stops server-side once the limit is reached -- no extra API calls. Great for spot-checking recent uploads.

uplint s3-scan --bucket my-uploads --prefix uploads/2026/02/ --limit 1000

Random sampling

--sample N

Reservoir sampling picks N random files from the entire bucket. Only N objects held in memory at any time, regardless of bucket size. Ideal for daily health checks.

uplint s3-scan --bucket production-uploads --sample 500

Resumable scans

--resume

Auto-saves progress every 100 files. If interrupted (Ctrl+C, network failure), resume exactly where you left off. Checkpoint files are atomic and auto-cleaned on completion.

uplint s3-scan --bucket million-files --workers 20 --resume

Authentication

Best for: CI/CD

Set AWS credentials as env vars. Best for CI/CD pipelines where secrets are injected automatically.

export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=wJalr...
export AWS_DEFAULT_REGION=us-east-1

uplint s3-scan --bucket my-uploads

Credentials are never exposed

Never logged, printed, or included in output. Hidden input when prompted interactively. Nothing written to disk.

Works with any S3-compatible storage

minio-example

uplint s3-scan --bucket my-bucket \
  --endpoint-url http://localhost:9000 \
  --access-key minioadmin \
  --secret-key minioadmin

Use --endpoint-url to point at any S3-compatible API -- MinIO, DigitalOcean Spaces, Backblaze B2, Cloudflare R2, or a private VPC endpoint.

Fast scan with parallel workers

Scan only PDFs under a specific prefix with 20 concurrent threads. Pre-download filtering means zero wasted bandwidth.

uplint s3-scan --bucket my-uploads \
  --prefix invoices/2026/ \
  --types pdf \
  --workers 20 \
  --skip-virus

Stop bad files before they ship

Runs anywhere your pipeline does

GitHub Actions, pre-commit hooks, shell scripts — uplint slots into your existing pipeline. Exit codes make pass/fail decisions automatic. Catch bad files before they ever reach production.

.github/workflows/validate.yml

name: File Validation
on:
  pull_request:
    paths:
      - 'assets/**'
      - 'uploads/**'
      - 'public/**'

jobs:
  uplint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install uplint
      - run: uplint scan ./assets/ --format json --skip-virus

Exit codes

All files passed

One or more issues found (blank, corrupt, virus, oversize)

Invalid input (path not found, S3 connection error)

Scan interrupted -- progress saved to checkpoint (s3-scan)

Strict mode

Add --strict to exit with code 1 on any issue, including warnings and errors.

uplint scan ./assets/ --strict

JSON output

Machine-readable output for log ingestion and custom tooling.

uplint scan ./ --format json

uplint runs locally. Uplint runs in production.

You've seen what uplint catches. Now imagine that running on every upload your users make — automatically. Uplint is the same engine, exposed as an API, with everything production demands.

Rate Limiting

Built-in protection against abuse with configurable limits per API key, per tenant, per context.

Audit Trails

Full visibility into every file operation. Track uploads, downloads, and deletions for compliance.

Multi-Tenant Isolation

Logical separation between tenants with independent storage contexts, rules, and access controls.

Security Dashboard

Real-time overview of threats caught, scan volumes, and validation trends across your application.

Love the CLI? Scale to the platform.

The same validation engine you trust locally, now with a security dashboard, analytics, full audit logging, and multi-tenant isolation. Start free — no credit card required.

Start Building Free View Documentation