Photography And Silent Data Corruption

Have you noticed silent data corruption (or so called bit rot) in your hard drives? I have, in all makes and all sizes. NTFS, Ext4 and HFS+ are powerless to stop your data from corrupting nor noticing you about it. Last time I saw this phenomenon was this week when quite recent photographs were suddenly corrupted on a HFS+ volume and the all disk reports say everything is fine. The only thing so far that notices this and refuses to work with disks that corrupt themselves is FreeBSD (or Solaris) and ZFS. Btrfs is said to include this same functionality, but it’s not stable yet (but not very far from it). Let’s hope it gets there soon. Having a couple kernel versions where is hasn’t corrupted anything so far is not what I’d call stable. ZFS won’t be coming to Linux because of the GPL incompatibility (yes, I know there is an userland wrapper for it, but no). At the moment unless I use ZFS and/or CrashPlan, I’m powerless against the silent corruption that eats my files. Somebody calculated the probability on a 2TB drive (when 2TB was considered to be a lot) and even then it wasn’t highly improbable to happen, it was expected to happen. Now I have tens of terabytes of data, almost 20TB of which is local backups, and this happens all the time. Mirroring your data as a backup strategy does not protect you against this, because you will be overwriting your intact files with the broken ones every time you update the backup (or the other way around, if you’re a lucky bastard, which I’m not – the lucky part that is). Google is said to be using distributed file system (GFS) that is also supposedly detecting these kind of problems, which I kind of expected, because all hell would break loose on a petabyte(s) volume with no error detection of any kind. Hard disk firmwares do their best to mitigate the problem, but sometimes it’s the disk firmware itself which is causing the problems. I have heard that (other) coders make mistakes sometimes…

Why is Windows still shipping with NTFS and nothing better? I know they have ReFS which has this implemented, but why is that limited to server environment, when the problem exists in desktops as well, not just servers? In Linux world they’re doing their best to have Btrfs stable enough to replace existing file systems. On Mac, nothing. You can use ZFS if you’re adventurous enough, but I would not call that a solution.

Data_loss_of_image_fileIn the Wikipedia article it’s estimated that the threshold for corruption is one in every 1016 bits. In a study performed by CERN over six months and involving about 97 petabytes of data, they found that about 128 megabytes of data became permanently corrupted. That’s a lot when it’s spread bit by bit all over the disks. Consider that switching single bit on a photograph will result in garbage shown on the title of the article.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.