File Management Automation Suite

Project Overview
This collection of specialized Python scripts addresses common file management challenges, including duplicate image detection, batch file renaming, and image processing tasks. Each tool was developed to solve specific workflow problems and increase productivity.
Technologies Used
Key Tools
- Image Duplicate Detector: Identifies and removes duplicate images using content-based hashing
- Batch Image Resizer: Processes multiple images with Lanczos algorithm for high-quality scaling
- File Renaming Utility: Implements systematic file renaming with collision detection
- Text Correction Tool: Performs fuzzy matching for text correction in filenames
Technical Implementation
Image Duplicate Detection
The duplicate detection tool uses content-based hashing rather than simple filename comparison. By converting images to a consistent format and generating MD5 hashes of the actual pixel data, the system can identify duplicates even when filenames differ completely.
def hash_image(image_path):
with Image.open(image_path) as img:
img = img.convert("RGB")
img_hash = hashlib.md5(img.tobytes()).hexdigest()
return img_hash
# Usage example (simplified)
def find_duplicates(directory):
hashes = {}
for file in files:
if file.lower().endswith((".png", ".jpg", ".jpeg")):
file_hash = hash_image(file_path)
if file_hash in hashes:
# Duplicate found
print(f"Duplicate found: {file_path}")
else:
hashes[file_hash] = file_path
Batch Image Resizing
The batch image processor efficiently handles multiple images at once, creating new versions at specified dimensions while preserving quality through the Lanczos resampling algorithm. The tool automatically organizes outputs into dimension-specific folders for easy management.
def resize_image(image_path, output_size):
with Image.open(image_path) as img:
# Lanczos algorithm provides high-quality resampling
resized_img = img.resize(output_size, Image.LANCZOS)
return resized_img
def batch_process(directory, sizes):
for file in walk_files(directory):
if is_image(file):
for size in sizes:
# Create size-specific output directory
output_dir = create_output_dir(file, size)
# Process and save the resized image
resized = resize_image(file, size)
save_with_metadata(resized, output_dir, file)
Intelligent File Renaming
The file renaming utility systematically processes files in a directory, applying consistent naming patterns while ensuring no existing files are overwritten. The system includes collision detection and automatic variant generation when duplicates are encountered.
def rename_files(directory, prefix):
for count, filename in enumerate(os.listdir(directory)):
src = os.path.join(directory, filename)
# Generate new filename with prefix and original extension
new_filename = f"{prefix}{count + 1}{os.path.splitext(filename)[1]}"
dst = os.path.join(directory, new_filename)
# Handle potential name conflicts
variant = 1
while os.path.exists(dst):
variant_name = f"{prefix}{count + 1}_{variant}{os.path.splitext(filename)[1]}"
dst = os.path.join(directory, variant_name)
variant += 1
# Perform the rename operation
os.rename(src, dst)
Text Correction System
The text correction tool employs fuzzy string matching algorithms to detect and fix potential typos or inconsistencies in filenames. By comparing against a dictionary of correct terms, the system can suggest or automatically apply corrections based on configurable accuracy thresholds.
def correct_word(word, dictionary, threshold=80):
# Check if word exists in dictionary
if word in dictionary:
return word
# Find closest match using fuzzy matching
match = process.extractOne(word, dictionary)
if match and match[1] >= threshold:
# Return correction if confidence is above threshold
return match[0]
# Return original if no good match found
return word
Efficiency Improvements
These automation tools have delivered measurable productivity benefits:
- Reduced image management time by approximately 70% for large collections
- Eliminated manual quality degradation issues in batch processing workflows
- Provided consistent naming conventions across multiple projects
- Saved storage space by automatically identifying and removing duplicate files
Common Use Cases
E-Commerce Image Management
Efficiently prepare product images in multiple dimensions required by different platforms while maintaining naming consistency.
Media Collection Organization
Clean up and organize large media libraries by identifying duplicates and standardizing filenames.
Batch Upload Preparation
Prepare large batches of files for upload to web services with specific naming and dimension requirements.
Dataset Preprocessing
Clean and standardize image datasets for machine learning or analytical purposes.
Technical Insights
Developing these tools highlighted several important principles in automation design:
- Idempotence: Ensuring operations can be run multiple times without unexpected side effects
- Error Resilience: Gracefully handling exceptions like corrupted files or unexpected formats
- Platform Compatibility: Writing code that works across Windows and Unix-based systems
- Memory Efficiency: Processing large files in chunks to prevent memory overflow