File Management Automation Suite

File Automation Tools Screenshot

Project Overview

This collection of specialized Python scripts addresses common file management challenges, including duplicate image detection, batch file renaming, and image processing tasks. Each tool was developed to solve specific workflow problems and increase productivity.

Technologies Used

  • Python
  • PIL (Pillow)
  • OS Module
  • Hashlib
  • FuzzyWuzzy

Key Tools

  • Image Duplicate Detector: Identifies and removes duplicate images using content-based hashing
  • Batch Image Resizer: Processes multiple images with Lanczos algorithm for high-quality scaling
  • File Renaming Utility: Implements systematic file renaming with collision detection
  • Text Correction Tool: Performs fuzzy matching for text correction in filenames

Technical Implementation

Image Duplicate Detection

The duplicate detection tool uses content-based hashing rather than simple filename comparison. By converting images to a consistent format and generating MD5 hashes of the actual pixel data, the system can identify duplicates even when filenames differ completely.

Image Hashing Logic
def hash_image(image_path):
    with Image.open(image_path) as img:
        img = img.convert("RGB")
        img_hash = hashlib.md5(img.tobytes()).hexdigest()
    return img_hash

# Usage example (simplified)
def find_duplicates(directory):
    hashes = {}
    for file in files:
        if file.lower().endswith((".png", ".jpg", ".jpeg")):
            file_hash = hash_image(file_path)
            if file_hash in hashes:
                # Duplicate found
                print(f"Duplicate found: {file_path}")
            else:
                hashes[file_hash] = file_path

Batch Image Resizing

The batch image processor efficiently handles multiple images at once, creating new versions at specified dimensions while preserving quality through the Lanczos resampling algorithm. The tool automatically organizes outputs into dimension-specific folders for easy management.

Image Resizing with Quality Preservation
def resize_image(image_path, output_size):
    with Image.open(image_path) as img:
        # Lanczos algorithm provides high-quality resampling
        resized_img = img.resize(output_size, Image.LANCZOS)
    return resized_img

def batch_process(directory, sizes):
    for file in walk_files(directory):
        if is_image(file):
            for size in sizes:
                # Create size-specific output directory
                output_dir = create_output_dir(file, size)
                
                # Process and save the resized image
                resized = resize_image(file, size)
                save_with_metadata(resized, output_dir, file)

Intelligent File Renaming

The file renaming utility systematically processes files in a directory, applying consistent naming patterns while ensuring no existing files are overwritten. The system includes collision detection and automatic variant generation when duplicates are encountered.

Sequential Renaming with Collision Prevention
def rename_files(directory, prefix):
    for count, filename in enumerate(os.listdir(directory)):
        src = os.path.join(directory, filename)
        
        # Generate new filename with prefix and original extension
        new_filename = f"{prefix}{count + 1}{os.path.splitext(filename)[1]}"
        dst = os.path.join(directory, new_filename)

        # Handle potential name conflicts
        variant = 1
        while os.path.exists(dst):
            variant_name = f"{prefix}{count + 1}_{variant}{os.path.splitext(filename)[1]}"
            dst = os.path.join(directory, variant_name)
            variant += 1

        # Perform the rename operation
        os.rename(src, dst)

Text Correction System

The text correction tool employs fuzzy string matching algorithms to detect and fix potential typos or inconsistencies in filenames. By comparing against a dictionary of correct terms, the system can suggest or automatically apply corrections based on configurable accuracy thresholds.

Fuzzy Matching for Text Correction
def correct_word(word, dictionary, threshold=80):
    # Check if word exists in dictionary
    if word in dictionary:
        return word
        
    # Find closest match using fuzzy matching
    match = process.extractOne(word, dictionary)
    
    if match and match[1] >= threshold:
        # Return correction if confidence is above threshold
        return match[0]
    
    # Return original if no good match found
    return word

Efficiency Improvements

These automation tools have delivered measurable productivity benefits:

  • Reduced image management time by approximately 70% for large collections
  • Eliminated manual quality degradation issues in batch processing workflows
  • Provided consistent naming conventions across multiple projects
  • Saved storage space by automatically identifying and removing duplicate files

Common Use Cases

E-Commerce Image Management

Efficiently prepare product images in multiple dimensions required by different platforms while maintaining naming consistency.

Media Collection Organization

Clean up and organize large media libraries by identifying duplicates and standardizing filenames.

Batch Upload Preparation

Prepare large batches of files for upload to web services with specific naming and dimension requirements.

Dataset Preprocessing

Clean and standardize image datasets for machine learning or analytical purposes.

Technical Insights

Developing these tools highlighted several important principles in automation design:

  • Idempotence: Ensuring operations can be run multiple times without unexpected side effects
  • Error Resilience: Gracefully handling exceptions like corrupted files or unexpected formats
  • Platform Compatibility: Writing code that works across Windows and Unix-based systems
  • Memory Efficiency: Processing large files in chunks to prevent memory overflow