This is the final part of my blog series on digital images, looking at compression and quality for internet storage of image files (JPEG). In the previous parts of this blog I’ve been focusing on the JPEG file format and how to obtain high quality JPEGs. I’ve also talked a little about image resolution (not to be confused with image quality) in an earlier post.
In this article I’d like to talk about the different end uses for digital images, and guide you towards some of the tools that can help with long term archival as well as ways to retain very high quality original data in parallel with JPEGs.
Storage trade-offs
Publishing to the internet is, for now, synonymous with short term publishing and often relies on small file sizes. Despite the advances made in storing data on large hard drives, when it comes to placing content online there are still bottlenecks and costs which we seek to minimise. The most significant challenges for online storage are still bandwidth and storage.
Digital archival for the long term - let’s say for 50 years or more – has a totally different set of parameters from internet storage.
Firstly, we’re looking for something durable (and if we need a lot, cheap), but not necessarily fast and not necessarily shared with anyone else in real time. This change in approach buys us many opportunities to use different kinds of storage and file formats without worrying too much about a rapidly changing web audience (yet).
The good news is that bandwidth is no longer a factor – you can create and store content locally as fast as you can create it. The other good news is that hard drive capacity over time has grown exponentially. One of my favourite graphs is this one from Wikipedia, which shows the tremendous growth of capacity in storage. Cost per gigabyte for consumer-grade storage is at an all-time low (another graph).
Secondly, we’re trying to retain the most information and quality we can: we want to store reference materials which, if necessary, we can work from to produce working content in the future. Each original is an asset, with a nominal value and a future life which may involve many different uses and channels.
To archive images, then, is not just a capacity challenge. It’s also an information management problem (a digital asset management problem) because we need to archive the original file and retain sufficient metadata to make sense of the content and retrieve it later.
Is JPEG the right choice for archival?
JPEG, even at a high quality setting, has the following drawbacks for archivists:
- Inherently lossy compression
- Irreversible quality degradation on repeated open-save cycles
- Can’t store layers or vector content (only “flat” raster content)
On the other hand, it is universally supported and is good at carrying metadata, colour profiles and other metadata like EXIF.
Are there other contenders suitable for archival?
RAW Files
RAW files are often considered to be the gold standard of digital image archival, as they contain all of the original image before any adjustments. They also have some technical advantages in retaining extra exposure latitude and extra equipment data that makes them very attractive mines of information.
However, RAW files are inherently incomplete works, usually having no corrections or finishing work applied, so I would argue that they are better accompanied by a TIFF file or JPEG file that shows the finished work as its creator intended. This is how products such as Adobe Lightroom and Apple Aperture work.
TIFF and JPEG 2000
Other contenders for archival are TIFF and JPEG 2000.
Of these, I would very much like JPEG 2000 to be a serious consideration. It offers higher quality compression than JPEG for even smaller file sizes, and shares JPEG’s advantages as an open format. As a counterpart to a RAW file, for example, JPEG 2000 is probably the most efficient way to create a high quality archive with both a finished work and a RAW master.
Equally, TIFF files are widely adopted and support important requirements such as transporting XMP metadata, colour profiles and lossless compression – using either ZIP or LZW – which ensures they are at least as small as any other lossless format.
The Digital Dark Ages
A concern for digital archivists is that content we are creating now may be unreadable in the future due to the myriad proprietary formats of storage and file types in use. This could lead to what is known as the digital dark ages.
This supports my view camera RAW file formats are unsuitable in isolation, because so many use proprietary drivers. Progress towards support for ISO 12234-2 (TIFF/EP) is slow, so stepping through an extra procedure and turning RAW files into either a TIFF or JPEG (of either kind) is essential.
Digital Asset Management
Looking at purely internet image hosting, or purely long term archival, it is not hard to show competing needs for affordable storage, practical upload times, high quality archival, longevity and so on.
What happens when these needs compete? This is a situation that quickly spirals out of control. The most difficult aspects of long term archival (for example, deep silos of RAW files in proprietary formats) mix with ad hoc short term internet sharing (eg. sending files by email or on uncontrolled sharing tools).
Digital asset management, as its name suggests, starts by considering content to be an asset – something with value and worth that can be measured – and seeks to bring order and productivity back to the situation. Digital asset management covers images, but also other kinds of media like video. It tackles problems such as:
- Centralisation of content around a searchable database and storage platform;
- Shared and simultaneous access to assets by creators and users of assets;
- Roles, responsibilities and audiences with differing needs for the same assets;
- Workflow (processes), especially around upload and download stages – and the general mobility of assets;
- Metadata, normally as the highest priority content, with the possibility of controlled vocabularies and enforcement of policies;
- Delivery of assets in different technical formats or in prescribed sizes, without losing master files;
- Delivery of assets across both internal and external networks (to PCs, to Macs, to iPads or any other device);
- Enterprise integration of systems (storage, authentication and backups)
- Record-keeping and audit trails.
These are areas I will explore in my blog, so I hope you will stay tuned. I’ll be going into more specifics on how to manage your digital content in an environment that is all about addressing competing needs conveniently. I’ll show you some innovations in Third Light Intelligent Media Server v6, our plugins for Apple Aperture and Adobe Lightroom, and other products, too.