Directory Structure Checking Script

This is a script to check if my photo archives conform to my naming rules.

My archives look like this:

  • Pictures/
    • 2024
      • 2024-01-01 New Years
        • <many files that end in>.jpg
      • 2024-05-01 May Day
    • 2025
    • Some Other Directory Name

Right now, it’s approximately 65,000 files totalling 96GB, so it scales up okay for me. It might not work for you.

This script has a bunch of exceptions and slack coded in, because my archives are still a little bit messy, and if it didn’t have these exceptions, there’d be too many errors.

The current error list is:

Scanning directory: /media/johnk/External Backup/John Kawakami Archive/Pictures
Non-conforming items:
File not in directory: /media/johnk/External Backup/John Kawakami Archive/Pictures/2001/*
File name invalid: /media/johnk/External Backup/John Kawakami Archive/Pictures/2005/2005 camper/campersmall.bmp
File name invalid: /media/johnk/External Backup/John Kawakami Archive/Pictures/2016/2016-10 Bernie/2016-06-berniejpg
File not in directory: /media/johnk/External Backup/John Kawakami Archive/Pictures/2018/20180702_174053.jpg
File not in directory: /media/johnk/External Backup/John Kawakami Archive/Pictures/2021/2021-10-31_El_Sereno_Steps.zip
File name invalid: /media/johnk/External Backup/John Kawakami Archive/Pictures/2021/2021-June-17-Action/newsletter.pages
Non-conforming directory: Blog
Non-conforming directory: Boyle Heights Burn to Rebuild
Non-conforming directory: Documents

I’ll tighten it up as I fix up the archives. Eventually, I’ll use these commands to pre-flight directories before they are copied into the archives.

#!/bin/bash

# Directory to scan
DIR_TO_SCAN="$1"

# Naming convention regex pattern
# All dirs start with a year or date
DIR_PATTERN="^\d{,4}(-\d\d|)(-\d\d|)[ a-zA-Z0-9_.-]+$"
FILE_PATTERN="^[ \(\)~#.a-zA-Z0-9_-]+?\.(JPEG|JPG|jpeg|jpg|gif|heic|PNG|png|webp|webm|mp4|MP4|MOV|mov|xcf|XCF|AVI|avi|odg|ppm|odt|txt|rtf|amr|pdf|THM)$"
SPECIAL_DIR_PATTERN="^()$"
IGNORE_DIR_PATTERN="^(Food|Food Pictures|Free Stuff|Memes|Photos.+|Ebay)$"

# Function to check directory structure and naming convention

check_photo_files() {
    local dir="$1"
    local indent="$2"
    
    for item in "$dir"/*; do
        if [[ -f "$item" ]]; then
            if [[ ! ${item##*/} =~ $FILE_PATTERN ]]; then
                echo "${indent}File name invalid: $item"
            fi
        fi
    done
}

# Every directory should be named with the date.
check_year_archive() {
    local dir="$1"
    local indent="$2"

    for item in "$dir"/*; do
        if [[ -d "$item" ]]; then
            if [[ ! ${item##*/} =~ $DIR_PATTERN ]]; then
                echo "${indent}Non-conforming directory: ${item##*/}"
            else
                check_photo_files "$item" " $indent"
            fi
        elif [[ ! -d "$item" ]]; then
            echo "${indent}File not in directory: $item"
        fi
    done
}

# Every dir should be a year
# Unless it's in the special directory pattern
check_photo_archive() {
    local dir="$1"
    local indent="$2"

    for item in "$dir"/*; do
        if [[ -d "$item" ]]; then
            if [[ ${item##*/} =~ $IGNORE_DIR_PATTERN ]]; then
                : # do nothing
            elif [[ ${item##*/} =~ ^[0-9]{,4}$ ]]; then
                check_year_archive "$item" " $indent"
            elif [[ ${item##*/} =~ $SPECIAL_DIR_PATTERN ]]; then
                : # check_year_archive "$item" " $indent"
            else
                echo "${indent}Non-conforming directory: ${item##*/}"
            fi
        elif [[ -f "$item" ]]; then
            echo "${indent}File not in directorty: $item"
        fi
    done
}

# Start the scan
echo "Scanning directory: $DIR_TO_SCAN"
echo "Non-conforming items:"
check_photo_archive "$DIR_TO_SCAN" ""