Amazon Glacier – an archival solution for your digital memories

One of the biggest challenges of digital photography is the long term archival of the images. And because slides and negatives are generally scanned, and end up in the same post-processing chain as “native” digital images, they’re subject to more or less the same issues (I guess that you could still go back to the original negative or the slide and re-scan it, but you would have to locate it first).

lightroom
This early digital picture was taken in 2002 with a Samsung digicam and stored in iPhoto. But it was imported in Lightroom at a later date and is still accessible.

There are three big obstacles to the long term preservation of pictures in a digital world:

  • the long term availability of the digital asset management software,
  • the evolving file format standards
  • the inherent fragility of the medium used for storage

Users of the original version of Apple iPhoto, of Apple Aperture, of Microsoft Expression Media and of plenty of other discontinued products have not lost the original images stored in their photo libraries, but they have lost an easy way to access them – and in some case, of all the changes and adjustments (crops, exposure, contrast and curves) they had performed. Of course, it’s always possible to port the images to … the standard of the moment: Adobe Lightroom, but it may require a serious effort.

Adobe Lightroom is not about to disappear (on the contrary, it has become a de facto monopoly), but Adobe may progressively price it out of the reach of amateurs: they have already transitioned to a subscription-only licensing model, which may make sense for professionals, but is costly for amateurs who used to perform an upgrade every 5 years or so…

Surprisingly, evolving standards have not been too much of an issue so far – after early challenges by patent trolls were defeated, JPEG has led a quiet life. Evolutions of JPEG are being discussed in the international standardization bodies, but they promise to maintain backwards compatibility. At this stage, jpeg is still jpeg, tiff is still tiff, and we can still read files saved 15 years ago.

The proliferation of RAW file formats (how many for Nikon or Canon already? ) is also a potential issue, but computer Operating Systems and RAW converters still keep up – and support most of the old RAW formats, even though it’s probably wise to keep a JPEG or a DNG version of your images, just in case.

readynas_working
A NAS in working order (here, a Netgear Readynas with 4 1TB drives, all up) and Raid 6 configured.

Which brings us to the worst issue by far – the medium (tape, CD, DVD, hard drive, cloud blob) used for storage.

  • the storage needs have exploded (24 Mpixel is the new normal, and I know amateurs who refuse to shoot with anything less than a 40 Mpixel camera, like the Pros) – shooting 10 Gigabytes worth of images per day has nothing exceptional anymore,
  • At the same time, the capacity of WORM devices (CD, DVD, …) has stagnated,
  • solid state media is still expensive,
  • spinning hard drives have capacity but are fragile,
  • in spite of all the promises, consumer grade Network Attached Storage (NAS) is far from 100% reliable,
  • on line backup/archival services and cloud hosting services come and go (many vendors have decided to leave the consumer market, while some services are tied to a specific brand of computer or smartphone hardware), and some free photo sharing services may sell your secrets to advertisers (“if you’re not paying for the product, you’re the product”).
corruped_pict
Images can also get corrupted – without a good backup, the image would be lost forever (Lightroom does not store the images in its catalog, just the metadata and the “development” instructions, the issue is with the NAS or with the file sharing protocol).

For  long term storage at home, hard drives are currently the best option, but at least in my case,  they’ve been quite unreliable: over the last 10 years,

  • I lost two hard drives on my personal laptop, before I upgraded to a SSD – which has less capacity but seems to fare much better when it comes to reliability,
  • I lost a hard drive on the Apple Time Capsule I was using for backups (Green Seagate Barracuda)
  • I lost a LaCie network attached hard drive (a Barracuda also, I’m afraid)
  • Files got corrupted (see above),
  • the Netgear ReadyNAS RN104 (with four 1 TB drives arranged in a so-called X-RAID) lost its file allocation tables (even if the Western Digital Red disks were still OK) and had to be reinstalled from scratch – without using X-RAID this time, but under a proper RAID 6 scheme instead.
Netgear issue
The dreaded Netgear error message – search “ReadyNAS – Remove inactive volumes to use the disk. Disk #1,2” on Google to see other examples (source: Netgear Communities Forum)

Fortunately, I’ve always had relatively good backups (not 100% success at recovery – there’s always something that falls thru the cracks, but close enough)

Here is how my pictures are processed and protected, currently:

  • if I’m using a modern digital camera:
    • while traveling – I upload the files to my iPhone over Wi-fi at least once a day – then Apple syncs it to my Photo library in iCloud. It’s not a full backup – my Fujifilm XT-1 camera only uploads JPEG files via Wi-Fi, it does not upload the RAW files, and with a resolution limited to 1776×1184 (a bit above 2 Mpixels)  – but it’s convenient, good enough for social network updates, and better than nothing if the SD card fails or the camera is stolen,
    • the “exposed” SD or CF card are copied as soon as possible to the SSD of a  laptop;
    • and I store the SDs for up to 6 month before reformating and reusing them.
  • if I’m shooting with film:
    • I don’t have any form of backup until the film has been sent to the lab, processed and scanned (it’s the rule of the game with film – but it always makes me uneasy when a drop an envelope with a few irreplaceable rolls of film in a USPS mailbox, even if they have a 100% reliability record with me so far).
    • when the scans are available, I download them to the SSD  of the laptop, and  when I receive the negatives from the lab, I keep them in the proverbial shoebox.
  • once the JPEGs, the RAWs and the scans are on the laptop,
    • there is an automatic backup process to an external HDD drive (using Apple TimeMachine), to the NAS (TimeMachine again), and to Amazon Glacier (using the ARQ backup application)
    • I upload the pictures to the Netgear NAS for Lightroom processing and archival,
    • and the Netgear NAS is backed up to Amazon Glacier using the ARQ client of the Mac.
arq restore
ARQ backup – the restore request is being processed by Glacier. In the background, Lightroom with the folders already restored on the ReadyNAS.

Amazon Glacier

  • Amazon Glacier is the long term archival service of AWS (the Amazon Cloud). Storage is extremely cheap ($0.004 per GB per month) and Amazon keeps multiple encrypted copies of the data in multiple AWS data centers.
  • There are all sorts of interesting features for Enterprise clients. But it’s not the exclusive domain of IT departments and the man in the street can also store files on Amazon Glacier.
  • Now there’s a catch: data retrieval is not instantaneous (Amazon needs 3 to 5 hours to start processing the request in the standard retrieval mode) and it’s not free either ($0.01 per Gbyte in the standard mode) – which is perfectly fine if you remember that Glacier is about long term storage. Consider the typical use cases for an amateur photographer:
    • you lost the pictures of that fantastic trip you made 10 years ago – it’s not going to be an issue for you if Glacier starts retrieving the pictures 5 hours from now,
    • you lost a hard drive and its local backup with 1 TB of pictures (to a flood, a fire, a burglary, a massive power surge) – again, you’re not going to complain if the data retrieval actually starts a few hours after you requested it: you’ll be happy to retrieve  your files, even if it takes time (assuming 1TB, that would be 44 hours on a 50 Mbits broadband circuit continuously operating at that speed, which means much more time in reality) and you will have to pay a few dozens of dollars for the service.

Arq

  • Arq is a backup solution for Mac OS and for Windows, leveraging the storage and archival services provided by a large selection of public cloud services. I’ve been using it in conjunction with Glacier for a few years, and it’s proved its worth a few times already.

It may seem like overkill – but massive hardware failures, catastrophic events and user errors happen, sooner or later. If you don’t want to lose your pictures eventually, do something, now.


Definitions, Buzzwords and Acronyms:

Archive: collection of records kept for long term retention. Typically, archives are not actively used.

Backup: “process of making extra copies of data, that will be used to restore the original in case it is lost or corrupted”

AWS: Amazon Web Services – the on-demand cloud computing platform of Amazon.com

Cloud (cloud computing): Cloud computing is shared pools of configurable computer system resources and higher-level services that can be rapidly provisioned with minimal management effort, often over the Internet. Cloud computing relies on sharing of resources to achieve coherence and economies of scale, similar to a public utility. (Wikipedia)

HDD: hard disk drive – they’re called hard disk drives because there are made of a few hard, metallic disks spinning at high speed, with tiny mechanical arms moving a magnetic head a few microns above the disks. The technology has been here forever, hard drives are cheap, offer a large capacity, but are somehow unreliable over the long run. (see Backup, above)

NAS (NAS Drive): Network Attached Storage – appliance containing one or more hard drives, connected to a LAN, that provides file level data storage to PC or Mac clients. Practically, a NAS is a small file server, generally running a version of Linux, with an easy to use Web based configuration interface. For the user of a PC or a Mac, the NAS just presents itself as another storage volume in Windows Explorer or in the finder. Models supporting two or more disk drives generally offer redundancy mechanisms (mirroring, RAID) to minimize the consequences of a hard drive failure.

RAID: (Redundant Array of Independent Disks): a technology that provides data redundancy and performance improvements in storage systems using multiple physical disk drives. Having a NAS configured with RAID is not the panacea and does not dispense from running regular backups: RAID usually protects the data if one disk fails, but it does not protect against a massive failure (two or more disks fail, a disk controller corrupts the data) or against human error (files erased by mistake).

SSD: solid state drives. With a SSD, information is stored on microchips. There is no moving part. SSDs are both faster and more expensive than Hard Drives, that’s why they are used in laptops, but not in long term storage systems.


69320016
An image restored from a backup – Atlanta – Nikon FM – Nikon 24mm AF