Wednesday, July 09, 2008

Eating My Own Dog Food: File Cleanup

I'm in the process of cleaning up my files, as I mentioned last week. It's going a little better than I expected.

To filter duplicate files I'm using a free tool, Easy Duplicate Finder which is available at

It does a good job of matching possible duplicates using file size, file type, and file name checking. It doesn't apply any logic to removing them, which is a good thing - I'm forced to check each possible dupe as I go and delete (or not!).

The tricky bit is figuring out which folder structure I should be using in the Hub folder...going forward I will be hosting most of this information in the cloud so I can apply labels and metadata at that point, and surface these files in all sorts of funky ways, but right now I still need a place for my files to live.

I'm committed to the whole Single Source of Truth practice and so I'm forcing myself to use Windows Desktop Search, Vista folder bookmarks, and a central desktop folder root.

It's been a real question of discipline - I often find myself reverting to bad practices and trying to copy the same file in multiple places if I can't quite decide where it should go. Over the last couple of days I've started getting a little better at placing any particular file in a single place and adding shortcuts and bookmarks to it.

Cleaning Up Music Files

  • I did a desktop search for *.mp in the root of the old c:\music folder
  • I copied all the files which were the results of the search into a new, single folder called "cleanup"
  • Then I did a duplicate file finder scan on that folder so it could clean it all up

Duplicate File Search Results 

  • I deleted the duplicate files...and went from about 3,920 files and 12 gigs in the music folder to about 2,152 files and 8.9 gigs of music.

Then I foolishly decided to let iTunes manage the folder.

For some reason no matter what I do iTunes duplicates the files 2 to 4 times. As near as I can tell, it is doing this because it thinks the song could belong to multiple albums for that artist....Whatever the case, I am left with iTunes havoc and I am still puzzling out how to remove duplication in this folder without deleting original songs.

Thanks iTunes! Maybe it's not worth the bother.

Cleaning Up Photos and Videos

  • I did a duplicate file clean on the photos and videos folder
  • This is somewhat harder than managing music - as most cameras will tag a file with the same broad tag and then you copy 100 pictures labeled [Tag] 001, [Tag] 002, etc...
  • It required lots of manual passes, and I made sure not to delete things I wasn't sure about
  • I did some targeted searches using Windows Desktop search tool - looking for keywords and trying to merge things into broader folders
  • In the end, I've gone from 2100 files (including 1102 duplicates) and 36 gigs of content to 998 files and 28.7 gigs of space.

Metadata is much more important for photos than for music files, because I can use them in many more ways (image editing, personal scrapbooks, social networking sites, websites).

Because an image can be used in many different places, it isn't easy to choose a single folder for a photo to live in. I've started experimenting with bulk insert into Picassa or similar so I can tag 'em. Also pondering adding to the Information Management plan since I am in the whole Cloud frame of mind.

How do you manage photos on your desktop, especially with metadata? Are there photo management tools you couldn't live without?


  1. I did some considerable research to find the best approach for tagging Photos and videos. One concern I had was tagging photos and storing those tags in a propriety format that I would not be able to see my tags on a photo in the future and secondly ensuring that whatever program I used was using current industry standards for tagging. So with that in mind I found out that the XMP format from Adobe is becoming the defacto standard for storing metadata about photos and Microsoft have also adopted this format. The problem with most programs out there is they maintain keywords and tags in a separate database to the actual files. (Picasa does this). That is bad because if I copy all my photos to another computer or have them backed up somewhere else, without the database I have no tags. The XMP format embeds the keywords and tags in the files themselves so they are fully portable and the keywords are never lost and are program independent. So I looked at some of the Adobe photo cataloguing applications but finally chose Microsoft Expression Media 2 because of its great support for tagging photos and then embedding them in the files and also the simple extensible scripting mechanism they have for allowing you to rename and tag your photos in bulk based on the date the photo was taken, or the file name or any other piece of metadata you can think off. Very powerful, and I know my photos are storing all my keywords and tags in XMP format within the photos themselves and it is a standard that will guarantee I can read the tags on computers in 10 years or so. I think the best option at the moment.

  2. Wow, that's awesome - thoroughly researched as always Marshy! It's definitely a massive maintenance nightmare if the metadata and data are kept in two different places - but I never thought about this. Thanks for the great tip!
    Is Expression Media working well for you so far? I notice with their archive feature I could back up my stuff to my mounted JungleDisk drive.

  3. I have not used the archive feature in Expression Media. I have noticed the response of the application can lag for a few seconds every minute or two but I have not determined if it is my machine or if it is the 11GB/6000 photos. The actual media catalogue file is 80MB. Let me know how it performs when you point it at your collection if you run with it for your media collection. Thanks.

  4. It seems to be running fine - my catalogue file says it is 210 MB and the media is 68 MB. So it is slightly smaller, but doesn't seem to lag at all. I believe we are both running with 3G of RAM although I have yet to upgrade from Vista to XP!

  5. For finding duplicate files try Directory Report

    Its not free but has free trial period
    It has lots of features to find out where all your disk space is going


Note: only a member of this blog may post a comment.