Organising Photos and Videos

After the Christmas break i had a few more photos to file away. As i dumped them into the “Family Photos” folder on my NAS i noted that this was just the latest collection of photos that i’ve unceremoniously dumped in there, intending to file them “properly” one day.

I finally acknowledged that that day was never going to arrive, so i should do something differently now. I had a session with Claude asking for a python script that could arrange the photos using only the information available.

It has taken sixteen revisions (so far) to get organise.py to it’s current state and i’m pretty happy with the result.

My process is to dump my photos and videos into a folder called chaos and run organise.py . That script moves those files from chaos into one of three folders; order, error or duplicates. As i decide that the script can be better, i move them from order (or maybe error) back to chaos, run an improved organise.py again and observe the result.

I decided that the filenames were meaningless, so i’m using an md5 of the file’s contents as the new file’s name. That helps me recognise duplicates, which is the folder that they go into. That’s my first win. Out of 122G of treasured memories, i’ve finally removed 14.4G of duplicate needles from that haystack.

Under the order folder, i’ve got a three level structure. First the name of the camera that they were taken on, then the year they were taken on, then the month they were taken on. I’m only trusting dates in the EXIF header data, not the file’s mtime or ctime. The name of the camera alone tells me a lot about the time in my life that the photo or video comes from. It’s been a bit fun to look up some old models and reminisce about old gadgets and the years they were from.

Unfortunately, a lot of the files don’t have information about the camera. In that case i’m using the resolution of the photo or video to calculate the number of pixels per frame (because i don’t care about orientation). I’m zero padding those numbers out to 10 places so that i can sort the folders alphabetically by folder name to get an idea of sizes. This gives me a folder name like “Unknown 0005703943” (for example). That at least lumps files together by resolution which is better than nothing.

There’s still a lot of junk in my collection, so now that i have a way to broadly group them i should be able to sweep through this a bit more swiftly. My next phase of work will be to delete big collections of junk that i haven’t been able to easily isolate until now.

I’m optimistic that i might finally be that person with a carefully curated library of family memories. The digital equivalent of this…

It’s very exciting that i can now take care of jobs like this that have just not been worth the effort until now.

I’ve tagged this post with llm hoping that others might tag their similar stories similarly. Does anybody else have any stories of how they’ve used an LLM to tackle a problem that wasn’t worth the effort previously?

Interestingly this article popped into my Inbox today. The author presumably used Claude Opus to sort his files not just by date, size and metadata but also by content.

It might be more difficult (viz. currently impossible) to categorise images, particularly if you are iterating over a large collection but I expect it will happen with more RAM, more tokens and more money.

Speaking of image recognition I was impressed that Perplexity could OCR the text in a screenshot of a boot message error. The text itself was not “swipeable” but she didn’t raise a sweat in analysing the text and suggesting the next step. Admittedly OCR has been around for decades.

Interesting discussion. On the LLM coding front, I’ve had experience of what I’d call “approximate success”. I seem to get to a point where something is 80-90% there and then needs a significant amount of rework to get over the line and do exactly what I want it to do, exactly the way I want it to happen.

I absolutely love what you’ve done with the photos, @jdownie - very clever! I’m at a fork in the road at the moment, and I’m not sure which way I want to proceed from here. In December, I did a Google Takeout of my Google Photos, which was the last of the Google services I was using, and killed the account (and the $$$). Worth it so far to achieve the ongoing goal of de-big-tech-ing, but now I’ve got a massive offline folder of photos and metadata which I’m not really sure what to do with. These photos only represent a period of around 5 years too… so I’ve got a much bigger problem once I factor in the other services or offline drives where things are scattered about.

On the other hand, @zeeclor’s linked article gets to the core of my fear about unleashing an LLM/AI on my files. Even aside from training and privacy concerns, the idea that I’m handing over control to something capable of executing rm is a bit worrying to me.

I do miss a few things from Google Photos which were obviously being done via some sort of image recognition, such as searching for OCRable text, or even putting in things like “car” and getting pictures of cars. Coming back to the LLM/AI talk - is anyone using Immich or similar to use those functions locally? How are you handling things such as backup, etc.? @jdownie, is that the next piece of the puzzle once the MD5 sort is done?

I understand the pain of organising photos. It took years to organise all my legacy photos into albums. I have given up doing this since 2024 because of the effort. I have all my photos backed into google photos and image recognition covers everything I want. In spirit of self hosting, I knew this had to change and moved to something called Immich now for ~6 months and its been great. It’s pretty similar to google photos, which are the features I am after. You can point it to a graphics card for AI processing, yet to set that up still.
Been loving Immich and what it stands for its now one of the very few open source software I have supported financially now to by becoming an “Immich supporter”

1 Like