A Philosophy for Personal Media File Storage
If you've been taking photos and videos for more than a few years, you've probably accumulated a mess. Files from phones named IMG_20230415_143022.jpg, Pixel cameras producing PXL_20230415_143022.mp4, screenshots labeled Screenshot_2023-04-15, and whatever your old point-and-shoot decided to call things. They're scattered across devices, cloud services, and backup drives — some with correct dates in their metadata, some without, some with dates that are just wrong.
At some point you try to organize them. You create folders: "Vacation 2023", "Family", "Work Events". It feels productive for a week. Then a photo belongs in two folders, or you can't remember whether that dinner was "Family" or "Friends", or you find a folder called "Misc" with 400 unsorted files from three years ago.
I went through this cycle a few times before stepping back and rethinking the problem.
Separate Storage from Organization
The root issue is conflating two different concerns: where files live on disk and how you browse and find them. These have different requirements, and trying to solve both with a folder hierarchy satisfies neither.
Storage should be boring. It needs to be consistent, predictable, and optimized for durability. You want to back it up with rsync and know that nothing got lost. You want to look at any file and immediately know when it was taken. You don't want to think about it.
Organization — albums, tags, facial recognition, search, sharing — is subjective and fluid. The same photo might belong to "Trip to Berlin", "Photos with Anna", and "2023 Highlights". It changes over time as you create new albums or rethink old ones. This is a problem that tools like Google Photos, Apple Photos, or self-hosted alternatives like Immich are purpose-built to solve. They index metadata, run AI tagging, and let you organize however you want — without moving a single file.
The insight is that these tools work best when your underlying files have reliable, consistent metadata. If every file has a correct EXIF date and a predictable name, the organization layer can do its job without fighting your storage layer.
So the philosophy is simple: make storage deterministic and delegate organization to tools designed for it.
Why Date Is the Right Primary Key
Every media file has exactly one creation timestamp. It's objective — there's no judgment call about what "category" a photo belongs to. It's immutable — a photo taken on April 15th, 2023 was taken on April 15th, 2023, forever. And it's universal — phones, cameras, and screenshots all produce timestamps, even if they store them differently.
Date also makes practical operations easy. Backups are incremental by month. Canonical naming makes spotting duplicates straightforward — two files with the same timestamp and size are almost certainly the same photo. And when you sort by name, you get chronological order for free.
The Target File Structure
Here's what every media file ends up looking like in my system:
$MEDIA_HOME/All/
2023/
01/
20230115_083022.jpg
20230115_143022.mp4
20230115_143022-1.jpg
04/
20230415_091500.heic
2024/
03/
20240301_120000.png
The rules are intentionally rigid:
- Canonical filename:
YYYYMMDD_HHMMSS.ext— derived from the file's EXIFCreateDate - Lowercase extensions only:
.jpg, not.JPG - Duplicate handling: if two files share the same second, they get a
-1,-2suffix - Minimum 30KB file size: filters out thumbnails and metadata artifacts
- Supported types: jpg, jpeg, png, heic, mp4, mov
No albums, no categories, no nested project folders. Just year, month, and a timestamp. The structure is so predictable that you could write a one-liner to verify its integrity.
Here's a typical before-and-after. A messy directory like this:
Downloads/phone-backup/
IMG_20230115_083022.JPG
PXL_20230415_091500.jpg
Screenshot_2023-04-15-14-30-22.png
VID_20240301_120000.mp4
random_photo.jpg (no date in name, but has EXIF data)
old_scan.jpg (no date anywhere, taken from mtime)
Becomes:
$MEDIA_HOME/All/
2023/
01/
20230115_083022.jpg
04/
20230415_091500.jpg
20230415_143022.png
2024/
03/
20240301_120000.mp4
...
Every file normalized, every date preserved, every name predictable.
The Workflow: Normalize, Then Import
To get from mess to order, I built two CLI tools: media-normalize and media-import. The workflow is always the same — normalize first, then import into the library.
Step 1: Normalize
media-normalize takes a directory of media files and standardizes them in place. It does three things in sequence:
Lowercase extensions. Files like IMG_1234.JPG or video.MOV get their extensions lowercased. Simple, but necessary — mixed-case extensions cause subtle issues with tools that do case-sensitive matching.
Fill missing EXIF dates. This is where most of the complexity lives. Not every file has valid EXIF metadata, so the tool uses a fallback chain:
- If the file already has a valid
CreateDatein its EXIF data, leave it alone - If not, try to parse the date from the filename — it recognizes common prefixes like
IMG_,PXL_,VID_,Screenshot_followed by a timestamp, as well as dashed date formats like2023-04-15_14-30-22 - If that fails too, fall back to the file's modification time
Whichever source provides the date, the tool writes it into all the standard EXIF timestamp fields (DateTimeOriginal, CreateDate, ModifyDate) along with timezone offsets. This means downstream tools always have complete temporal metadata to work with.
Rename to canonical format. Once every file has a valid date, it gets renamed to YYYYMMDD_HHMMSS.ext. Files that already match this pattern are skipped.
A typical normalization looks like this:
# Always preview first
media-normalize --dry-run --recursive ~/Downloads/phone-backup
# If the preview looks right, run it for real
media-normalize --recursive ~/Downloads/phone-backup
There's also a --date rescue flag for trickier situations — say you have a batch of scanned photos where the dates are completely wrong. You can force-set a date on the entire batch:
media-normalize --date "2019-06-15 10:00:00" ~/Downloads/old-scans
This sets the oldest file (by modification time) to the specified date and offsets all other files relative to it, preserving the order in which they were originally created. It's a lifesaver for photo rescue operations.
Step 2: Import
Once files are normalized, media-import moves or copies them into the date-organized directory tree:
# Preview the import
media-import --dry-run ~/Downloads/phone-backup
# Move files into the library (deletes originals after successful copy)
media-import --move ~/Downloads/phone-backup
# Or copy if you want to keep the originals
media-import ~/Downloads/phone-backup
The tool reads each file's CreateDate and places it into $MEDIA_HOME/All/YYYY/MM/. If any files still lack dates at this point (unlikely after normalization, but possible), it fills them from modification time before importing.
The --move flag is what I use most — once files are in the library, I don't need the originals in my Downloads folder. The default copy mode is there for when you want to be extra cautious.
Fitting It Into Nix
I manage my system configuration with Nix, and these tools fit naturally into that setup. The scripts are packaged as a Nix derivation that bundles them with their runtime dependencies — exiftool, coreutils, and findutils — so there's nothing to install separately.
On top of the package, there's a small home-manager module:
{
media.tools = {
enable = true;
mediaHome = "${config.home.homeDirectory}/Pictures/Photo";
};
}
When enabled, it adds media-normalize and media-import to the system PATH and sets the MEDIA_HOME environment variable. No manual setup, no forgetting to export a variable in your shell config.
This module is part of a broader media role that also pulls in VLC, Shotwell, and other media-related software. Enabling the role on any machine gives you the complete media toolkit — declarative and reproducible.
The practical benefit is consistency. Whether I'm on my laptop or a desktop, the tools are identical, the paths are the same, and the workflow doesn't change. When I set up a new machine, media management is one enable = true away.
The Payoff
This setup has been running for a while now, and the daily experience is remarkably low-friction:
- Backups are trivial.
rsyncthe$MEDIA_HOME/All/tree to an external drive or remote server. The date-based structure means incremental backups are fast — only new months have new files. - Google Photos handles the fun part. Albums, sharing, search, AI-powered "remember this day" features — all working on top of files with clean, reliable metadata.
- No more lost dates. Every file has complete EXIF timestamps, including timezone offsets. Nothing gets filed under January 1, 1970.
- Finding files is predictable. If you know roughly when a photo was taken, you know exactly where to look in the filesystem.
The specific tools don't matter as much as the philosophy. You could implement the same approach with a shell script and exiftool, or with any number of existing media management tools. The key insight is just this: let the filesystem be a dumb, reliable, date-sorted archive, and let purpose-built software handle everything else.