OWASP (Open Web Application Security Project) maintains the industry standard testing guide for web application security. Their file upload security section covers the critical controls that separate secure systems from vulnerable ones.
Most developers reference OWASP but don't fully implement their recommendations. This guide walks through each control with practical code, explains the reasoning, and shows common failures.
The OWASP Testing Guide for File Uploads
OWASP's testing methodology for file uploads focuses on these areas:
- Test Upload of Executable Files
- Test Overwriting Existing Files
- Test Upload of Malicious Files
- Test Handling of Dangerous File Types
From these concerns, six essential controls emerge.
Control 1: Whitelist File Types (Not Blacklist)
What it means: Define exactly which file extensions your application accepts. Never create a "banned" list and allow everything else.
Why it matters: Blacklists are breakable. New file types emerge constantly. Unknown extensions can be dangerous. Whitelists force explicit decisions about what your system should accept.
Vulnerable pattern:
# BAD: Blacklist approach
DANGEROUS_EXTENSIONS = {'exe', 'bat', 'sh', 'cmd', 'php', 'jsp'}
def is_allowed(filename):
ext = filename.rsplit('.', 1)[1].lower() if '.' in filename else ''
return ext not in DANGEROUS_EXTENSIONS
This fails because:
- New dangerous extensions aren't in the list
- Archives (ZIP, TAR, RAR) can contain executables
- Double extensions (file.php.jpg) can bypass the check
- Null bytes can truncate the extension
Secure pattern:
# GOOD: Whitelist approach
ALLOWED_EXTENSIONS = {'pdf', 'jpg', 'jpeg', 'png', 'docx', 'xlsx'}
def is_allowed(filename):
ext = filename.rsplit('.', 1)[1].lower() if '.' in filename else ''
return ext in ALLOWED_EXTENSIONS
def validate_filename(filename):
# Never allow multiple extensions
if filename.count('.') > 1:
return False
# Never allow special characters
if not re.match(r'^[a-zA-Z0-9_-]+\.[a-zA-Z0-9]+$', filename):
return False
return is_allowed(filename)
Control 2: Verify MIME Type (Don't Trust the Client)
What it means: Check the file's actual content type, not what the browser claims it is.
Why it matters: The client sends the MIME type header. An attacker can claim anything. Verifying the actual content prevents disguised files.
Vulnerable pattern:
# BAD: Trusting the client
@app.route('/upload', methods=['POST'])
def upload():
file = request.files['file']
if file.content_type == 'image/jpeg':
file.save('uploads/' + file.filename)
return 'OK'
This fails because file.content_type comes from the browser, which an attacker controls.
Secure pattern:
import magic
ALLOWED_MIME_TYPES = {
'application/pdf',
'image/jpeg',
'image/png',
'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
}
def validate_mime_type(file_content):
# Detect MIME type from actual file bytes
detected_mime = magic.from_buffer(file_content, mime=True)
return detected_mime in ALLOWED_MIME_TYPES
@app.route('/upload', methods=['POST'])
def upload():
file = request.files['file']
file_content = file.read()
if not validate_mime_type(file_content):
return 'Invalid file type', 400
# Proceed with upload
return 'OK'
Control 3: Enforce Strict File Size Limits
What it means: Set explicit maximum file sizes that match your use case.
Why it matters: Without limits, attackers conduct denial-of-service attacks by uploading enormous files. Legitimate files have known size ranges (profile photos are typically <5MB, documents <20MB).
Vulnerable pattern:
# BAD: No size limit
def upload():
file = request.files['file']
file.save('uploads/' + file.filename)
This allows attackers to upload 1GB+ files, exhausting disk space and network bandwidth.
Secure pattern:
MAX_FILE_SIZE = 10 * 1024 * 1024 # 10 MB
MAX_FILE_SIZE_PROFILE = 5 * 1024 * 1024 # 5 MB for images
def validate_size(file_size, context):
if context == 'profile_photo':
return file_size <= MAX_FILE_SIZE_PROFILE
return file_size <= MAX_FILE_SIZE
@app.route('/upload', methods=['POST'])
def upload():
file = request.files['file']
context = request.form.get('context', 'default')
# Check size before reading entire file
file_size = len(file.read())
file.seek(0)
if not validate_size(file_size, context):
return 'File too large', 413
file.save('uploads/' + file.filename)
return 'OK'
Control 4: Store Files Outside the Web Root
What it means: Never store uploads in directories that are directly accessible via HTTP.
Why it matters: If an attacker uploads an executable file (despite other controls failing), storing outside the web root prevents them from executing it directly.
Vulnerable pattern:
# BAD: Storing in web-accessible directory
UPLOAD_FOLDER = '/var/www/html/uploads'
def upload():
file = request.files['file']
file.save(os.path.join(UPLOAD_FOLDER, file.filename))
# Later, attacker accesses: http://example.com/uploads/malicious.php
Secure pattern:
# GOOD: Store outside web root
UPLOAD_FOLDER = '/var/data/uploads' # Not under /var/www/html
TEMP_FOLDER = '/tmp/processing'
def upload():
file = request.files['file']
# Generate secure filename (not the original)
file_id = secrets.token_hex(16)
safe_filename = file_id + '.' + get_safe_extension(file.filename)
upload_path = os.path.join(UPLOAD_FOLDER, safe_filename)
file.save(upload_path)
# Return file ID, not the filename
return {'file_id': file_id}
def download(file_id):
# Validate file_id exists in database
file_record = db.query(File).filter(File.id == file_id).first()
if not file_record:
return 'Not found', 404
# Serve from outside the web root using a download handler
return send_file(file_record.path, as_attachment=True)
Control 5: Disable Script Execution in Upload Directory
What it means: Configure the web server to never execute scripts in upload directories.
Why it matters: Defense in depth. Even if files get stored in the web root (mistake), prevent them from executing.
For Apache (.htaccess):
<FilesMatch "\.(php|php3|php4|php5|php7|phps|phtml|phar|shtml|exe|bat|sh|cmd)$">
Order Allow,Deny
Deny from all
</FilesMatch>
<Files *>
SetHandler default-handler
</Files>
php_flag engine off
For Nginx (server block):
location /uploads {
# Disable script execution
location ~ \.php$ {
return 403;
}
location ~ \.sh$ {
return 403;
}
# Serve as-is, never execute
default_type application/octet-stream;
}
Control 6: Scan for Malware and Threats
What it means: Integrate with threat intelligence services to detect known malicious patterns.
Why it matters: Structural validation and extension checks don't catch legitimate-looking files with embedded malware. Threat scanning requires external databases of known signatures.
Pattern: Integration with threat service:
import requests
VIRUSTOTAL_API = "https://www.virustotal.com/api/v3/files"
VIRUSTOTAL_KEY = os.environ['VIRUSTOTAL_API_KEY']
def scan_with_virustotal(file_content):
files = {'file': file_content}
headers = {'x-apikey': VIRUSTOTAL_KEY}
response = requests.post(VIRUSTOTAL_API, files=files, headers=headers)
if response.status_code != 200:
# API error — conservative approach: reject
return False
result = response.json()
# Check if any vendors detected threats
stats = result['data']['attributes']['last_analysis_stats']
return stats['malicious'] == 0 and stats['suspicious'] == 0
def upload():
file = request.files['file']
file_content = file.read()
if not scan_with_virustotal(file_content):
return 'File detected as malicious', 403
# Proceed with upload
return 'OK'
Control 7: Implement Comprehensive Logging
What it means: Log every file upload decision with full context.
Why it matters: When incidents occur, you need to answer: Who uploaded what, when, what checks passed/failed, and what was the decision.
Pattern:
import logging
import json
from datetime import datetime
logger = logging.getLogger('file_uploads')
handler = logging.FileHandler('uploads.log')
formatter = logging.Formatter('%(asctime)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
def upload():
file = request.files['file']
file_content = file.read()
user_id = request.user.id
log_entry = {
"timestamp": datetime.utcnow().isoformat(),
"user_id": user_id,
"filename": file.filename,
"file_size": len(file_content),
"mime_type": magic.from_buffer(file_content, mime=True),
"checks": {
"extension_valid": validate_extension(file.filename),
"mime_valid": validate_mime_type(file_content),
"size_valid": validate_size(len(file_content)),
"malware_scan": scan_with_virustotal(file_content)
},
"decision": "accept" # or "reject"
}
logger.info(json.dumps(log_entry))
if not all(log_entry['checks'].values()):
log_entry['decision'] = 'reject'
logger.warning(json.dumps(log_entry))
return 'Upload rejected', 400
file.save(/* ... */)
return 'OK'
Control 8: Rename Files to Remove Original Filename
What it means: Don't store files with the names users provided.
Why it matters: Original filenames can contain path traversal attempts (../../etc/passwd), special characters that break parsing, or encoding tricks that exploit decoders.
Secure pattern:
import secrets
from urllib.parse import quote
def get_safe_filename(original_filename):
# Extract original extension only
_, ext = original_filename.rsplit('.', 1) if '.' in original_filename else (None, 'bin')
# Validate extension
if ext.lower() not in ALLOWED_EXTENSIONS:
raise ValueError("Invalid extension")
# Generate random filename
safe_name = secrets.token_hex(16) + '.' + ext.lower()
return safe_name
def upload():
file = request.files['file']
safe_filename = get_safe_filename(file.filename)
# Store mapping from safe name to original
db.insert_file_record({
'safe_name': safe_filename,
'original_name': file.filename,
'user_id': request.user.id
})
file.save(os.path.join(UPLOAD_FOLDER, safe_filename))
return {'file_id': safe_filename}
Putting It Together: A Complete Example
from flask import Flask, request
import magic
import secrets
import os
import logging
app = Flask(__name__)
ALLOWED_EXTENSIONS = {'pdf', 'jpg', 'jpeg', 'png', 'docx'}
MAX_FILE_SIZE = 10 * 1024 * 1024
UPLOAD_FOLDER = '/var/data/uploads'
logger = logging.getLogger('uploads')
def validate_upload(file_content, filename):
# Check extension
_, ext = filename.rsplit('.', 1) if '.' in filename else (None, '')
if ext.lower() not in ALLOWED_EXTENSIONS:
return False, 'Invalid extension'
# Check size
if len(file_content) > MAX_FILE_SIZE:
return False, 'File too large'
# Check MIME type
mime = magic.from_buffer(file_content, mime=True)
expected_mimes = {
'pdf': 'application/pdf',
'jpg': 'image/jpeg',
'png': 'image/png'
}
if mime != expected_mimes.get(ext.lower()):
return False, 'MIME type mismatch'
# Scan for malware (simplified)
if is_malicious(file_content):
return False, 'Malware detected'
return True, None
@app.route('/upload', methods=['POST'])
def upload():
file = request.files['file']
file_content = file.read()
is_valid, error = validate_upload(file_content, file.filename)
if not is_valid:
logger.warning(f'Upload rejected: {error}')
return {'error': error}, 400
# Generate safe filename
safe_filename = secrets.token_hex(16) + '.' + file.filename.rsplit('.', 1)[1].lower()
file_path = os.path.join(UPLOAD_FOLDER, safe_filename)
with open(file_path, 'wb') as f:
f.write(file_content)
logger.info(f'Upload accepted: {safe_filename}')
return {'file_id': safe_filename}
Common Implementation Gaps
Incomplete extension validation:
- Allowing multiple extensions (file.php.jpg)
- Not lowercasing before checking
- Allowing null bytes or special characters
MIME type checking on surface level:
- Only checking the
Content-Typeheader - Not validating actual file content
File size limits that are too generous:
- Allowing 1GB files for simple documents
- No rate limiting on upload volume
Storing in predictable locations:
- Sequential filenames that attackers can guess
- Original user-provided names preserved
No threat scanning:
- Assuming legitimate users won't upload malicious files
- Missing the reality that files can be compromised in transit
Using a Service Instead
Given the complexity of implementing all these controls correctly, many teams use Uplint as their upload validation layer:
pip install uplint
from uplint import Uplint
uplint = Uplint(api_key="your_api_key")
async def validate_upload(file):
result = await uplint.validate(file, {
"scan": True,
"detectBlanks": True
})
return result.trusted
This replaces the entire control framework with a single API call.
Key Takeaways
OWASP's file upload controls are:
- Whitelist file types (not blacklist)
- Verify MIME type from content, not headers
- Enforce size limits appropriate to your use case
- Store outside web root to prevent execution
- Disable script execution in upload directories
- Scan for threats using external services
- Log comprehensively with full context
- Rename files to remove path traversal risks
These aren't optional guidelines. They're the minimum baseline for production systems handling untrusted uploads.
Uplint automates all eight OWASP controls in a single API call. Extension validation, MIME verification, malware scanning, blank detection, and audit logging — no configuration required. Start building free →