Six impossible things before breakfast.

A library science student's perspective on life, the universe, and everything.

Thursday, November 3, 2011

Digital Loss

An interesting article came up in one of my classes today. The Economist featured a story about the enormous loss of information caused by the "daily death of countless websites" and an overall lack of archiving digital data on the web. The sheer amount of born digital material means that the task of recording everything is literally impossible. But some large entities, like Google, may not be making much effort to archive and preserve information to begin with.

"Ancient manuscripts are still readable. But much digital media from the past is readable only on a handful of fragile and antique machines, if at all." There is a huge problem in the digital world concerning the lack of everlasting file types; with programs continually updating and reinventing themselves (plus new programs and file types cropping up all the time), it becomes increasingly difficult to access older material. A large portion of the job of archiving digital documents consists of renewing files again and again to ensure that they can still be opened and read properly. From a digital archivist's perspective, each one of these "migration" points represents a risk to the original data--a chance for the data to be corrupted outright or subtly altered in ways we might not even understand at the time.

Adam Farquhar is the head of the British Library's digital projects and he makes the extreme-sounding, but probably true, point that "the world has in some ways a better record of the beginning of the 20th century than of the beginning of the 21st." If by "better" Farquhar means more complete, and you consider all the websites, posts, comments, and stray information floating around on the web that has disappeared without a trace, his radical statement suddenly seems incredibly accurate.

On a related note, I recently looked up an old livejournal account and was both surprised and thrilled to see my old posts were still online. If I ever had copies of those entries it was on a computer now long gone, and I would have been a little sad to lose them completely. I really need to download those old high school writings and store them carefully if I want to ensure they will be around to remind and amuse me in years to come. I certainly can't count on livejournal to stand around forever. But when I do download them, what file format should I choose? .DOCX is very popular now, but I use mostly Open Office, so .ODT? That might make more sense for my computer use at the moment, but would it be the best decision for long term storage? .PDF is a highly reliable format, but will it always be? I think as we move into the future we will start to realize more and more just how essential good digital data curation is to our daily lives and activities.

See the full article on

No comments:

Post a Comment