Blog Alert: Digital Curation Blog
Aug. 4th, 2008 01:00 pmDoes your work (or hobby) involve archiving data, or looking at information that other people have archived? If so, you may be interested in the Digital Curation Blog, run by some people I know at Edinburgh University. Their latest post has some interesting snippets from a workshop on digital archiving, including this thought-provoking comment on storage formats:
...a half percent error rate in a BMP file shows a smattering of black pixels, whereas in a GIF file there were serious artefacts and visible damage introduced. Same error rate on a WAV file produces a barely audible rustle effect, while on a MP3 files sound is seriously distorted/. Same error rate on a DOC or PDF file, and you get “File damaged, cannot open”. Be very afraid!
Having done error-correction theory in my MSc (which included some painfully weird binary mathematics, some of which I understood for just long enough to pass the exam) it occurs to me that the overhead of adding error-correction coding to highly-compressed storage formats would be far less than the storage space saved my moving to them. An MP3 file is around a ten times smaller than the WAV file it is created from, so even going from 8 bits to 8+3 bits would still give a compression of seven times, with much improved resistance to bit errors. Anyone know anything about the use of ECC in highly-compressed lossy file formats?
...a half percent error rate in a BMP file shows a smattering of black pixels, whereas in a GIF file there were serious artefacts and visible damage introduced. Same error rate on a WAV file produces a barely audible rustle effect, while on a MP3 files sound is seriously distorted/. Same error rate on a DOC or PDF file, and you get “File damaged, cannot open”. Be very afraid!
Having done error-correction theory in my MSc (which included some painfully weird binary mathematics, some of which I understood for just long enough to pass the exam) it occurs to me that the overhead of adding error-correction coding to highly-compressed storage formats would be far less than the storage space saved my moving to them. An MP3 file is around a ten times smaller than the WAV file it is created from, so even going from 8 bits to 8+3 bits would still give a compression of seven times, with much improved resistance to bit errors. Anyone know anything about the use of ECC in highly-compressed lossy file formats?
no subject
Date: 2008-08-04 12:51 pm (UTC)Usually there's lots of error correction and detection going on anyway in the storage layer. The pattern on disk won't be the same as the bit pattern of the thing you're storing. The trick with adding useful error correction is to put it in a place where it's protecting against different sorts of problems from the layer below. So you have a hierarchy: for example, disks that do their own error-correction, arranged into a RAID5 array, with an offsite backup.
A side effect of the hardware doing its own error correction might well be that it's hard to get a minimally damaged block out of it - you might have an entire 1k of zeroes instead of the data you stored with one bit flipped. Protecting against that would involve distributing the redundancy all through the file. You might add 3 bits for every 8 bits of data, but store all those 11 bits in different blocks. This ups the read and write time but might be more useful for archival purposes.
no subject
Date: 2008-08-04 03:14 pm (UTC)no subject
Date: 2008-08-04 03:24 pm (UTC)no subject
Date: 2008-08-04 04:36 pm (UTC)no subject
Date: 2008-08-05 11:31 am (UTC)no subject
Date: 2008-08-05 11:48 am (UTC)