Archives
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- January 2011
- November 2010
- October 2010
- August 2010
- July 2010
Would You Believe It?
The following article was printed in Computer Shopper, June 1992 issue (page 152). Commentary follows.
The Big Squeeze
Compression Scheme Shatters Storage Logjam
Todd Daniel believes he has found a way to revolutionize data storage as we know it.
DataFiles/16, a zip-style data-compression product released by Wider Electronic Bandwidth Technologies (WEB), allows users to compress data at a 16-to-1 ratio. That’s a major advance when you compare it with Stacker 2.0’s 2-to-1 ratio.
DataFiles/16 relies purely on mathematical algorithms; it works with almost any binary file. Because of the math involved, the ratio of compression is directly proportional to the size of the uncompressed file. In order to get the full effect of 16-to-1 compression, the original file must be at least 64K.
During a demonstration at our offices, Daniel, the company’s vice president of research and development, compressed a 2.5Mb file down to about 500 bytes, four levels of DataFiles/16. Because successive levels compress at a lower ratio as the volume of the file decreases, DataFiles/16 directly zips and unzips files to the chosen level.
After compressing a file, users can compress the data another eight times with DataFiles/16. This gives DataFiles/16 the potential to compress most files to under 1,024 bytes, whether the original file is 64K or 2.6 gigabytes. By comparison, SuperStor 2.0’s new compression technique can be performed only once.
By June, WEB plans to release its first hardware packages utilizing the same method. The two new device-driver cards will operate impeccably, compressing and decompressing data on the fly at ratios of 8-to-1 and 16-to-1, respectively.
A standard defragmentation program will optimize data arrangement, while an optional disk cache will speed access time. Both cards will come in DOS, Macintosh, and Unix versions. The DOS version is scheduled for a July release, and the company says the others will follow shortly.
The implications of WEBs data-compression technique in the communications field have yet to be calculated, but Daniel says a 16-to-1 ratio could save certain companies up to 5 percent of their storage costs. If DataFiles/16 lives up to its early promise, data compression will have taken a quantum leap forward. — Jim O’Brien
Oh Really?
So much Computer Shopper. Why have you (most likely) never heard of DataFiles/16? Because it was a scam, of course. And since it wasn’t published in the April issue, it was presumably not a hoax by Computer Shopper itself but rather by the company behind it.
The article perhaps highlights the terrible fate of journalists: Writing about things they don’t understand. A computer scientist, or really anyone with a passing familiarity with information theory, would immediately recognize the claims as impossible and preposterous, if enticing. The only question about the article isn’t whether the whole thing was a scam, only how many people were in on it.
DataFiles/16, like other similar scams, was most likely an attempt to defraud investors rather than scam end users. Such compression could never work, so the question is only whether the software failed to achieve anything like the claimed compression ratios, or if it did… and could never decompress to anything resembling the original files.
These days it may be more difficult to set up compression scams, but the hucksters certainly didn’t disappear, they just moved elsewhere.
65 Responses to Would You Believe It?
Alright, it could’ve been like that, medoesn’t recall *exactly*,
but fact remains that mewas never ‘disappointed’ by the drives’
actual capacities.
To unwind the stack a little, PS pointed out above that formatting
a disk is not the same as formatting a file system. Me’d like to add
that at least in the UNIX world, ‘formatting a disk’ is a now very
uncommon operation (especially w/ the demise of floppies), while the
very diff ‘creating a filesystem’ has survived and is not even limited
to disks (creating an fs on tape or in a regular file usually works
just fine).
Due to mess-dos only being originally designed to work w/ floppies, and
there was no expected compatibility w/ other systems, the operation
called ‘formatting’, there, includes creating a file system, the latter
of which has become the primary meaning of the term on windoze, which
inherited the naming convention from mess-dos (me’s not sure how VMS did
it, which is relevant for NT). Hence the term ‘low-level format’ when
referring to hard drives, which aren’t usually formatted as often as
floppies were, ’cause hdds became the more complex and varied sort of
device (in most cases).
The command to create a filesystem on a disk in VMS is INITIALIZE. It only does a LLF on floppies. LLFing hard disks was done using using special diagnostic utilities.
Same as PCs. Low-level format of hard disks often required vendor-specific utilities (anything beyond dumb ST-506 drives, anyway). DOS FORMAT always discovered and marked bad floppy sectors during formatting; I think that was possible for hard disks too, FORMAT read the hard disk entire partition though it didn’t low-level format the drive. Utilities like Norton Disk Doctor had the ability to check for bad sectors and mark them in the FAT (both floppies and hard disks).
Oh, the pain. On ST-506 drives there were usually one of two or three ways to format the drive. On 8-bit ISA cards you usually fired up debug and tried to jump a few bytes in the eprom, for example g=c800:5 or similar. On 16-bit ISA cards that were 100% compatible with the card IBM supplied with their AT you used one of the several tools for this class of cards. Otherwise you usually had to use a vendor specific program, or maybe in rare cases you could press some key during boot or maybe use debug.
I don’t miss this a single bit. Perhaps the only good thing about this is that you had a chance to save data from bad disks using Spinrite from Gibson Research.
Compared to earlier systems, I liked the IBM PC system for formatting drives. Widely documented and lots of third parties devising easy alternate software. Having ROM capacity surge while prices dropped was very freeing for system design.
Several examples of how bad non-IBM PC drives got follow.
DEC didn’t let some systems conduct a LLF on floppy disks; the disks had to be purchased preformatted. At first, this was to handle a bug with early drives but later became a clear money grab. Some people tracked down programs that ran on an IBM AT to conduct the SSQD initialization needed for the RX50.
With disk packs, Diablo shipped them unformatted and Control Data would format and test them in house. No user methods were provided. Waiting 6 hours for a technician to install and format a drive was untenable. An exception to the idea was the Xerox Alto which had tools to format packs and install file systems and to transfer files erasing them from the first pack but lacked any function to test for bad sectors. Oops.
TRS-80 Model 2 had one of the more full featured format programs which did low level format, tested disk, laid out file system and installed system software if needed. Requiring 10 consecutive tracks to be flawless for the system software rather makes it necessary to do all the steps with one program.
That sounds like Xerox alright, bad things don’t exist, they can’t
happen.
Though that attitude has spread quite a bit since.
At least back then it was possible to do a low-level format from software; nowadays, the tracks are so small and so close together that low-level formatting a hard drive requires that the read-write head be positioned at least an order of magnitude or so more accurately than the read-write head mover motor thingy is physically capable of positioning the head without the preexisting format marker thingies on the disk to guide it, and low-level formatting consequently has to be done before the drive is sealed with a big machine called a servowriter.
As regards the original topic of the article:
>Such compression could never work, so the question is only whether the software failed to achieve anything like the claimed compression ratios, or if it did... and could never decompress to anything resembling the original files.
There’s a third possibility; they could have demonstrated it reaching a 16:1 compression ratio and then successfully decompressing the files… using only files that could easily be compressed that much with any compression program (such as large, very sparse database files or large, single-colour bitmaps). Even legitimate compression programs will occasionally be fed something that contains humongous amounts of redundant information, and consequently be able to, on occasion, achieve compression ratios that seem impossibly high; I’ve run across 7-Zip archives with compression ratios better than 10:1 on occasion myself.
And keep in mind that this is all lossless compression; in a situation where it’s possible to use lossy compression, far higher compression ratios are possible. MP3 files frequently achieve compression ratios of 20:1 compared to the original WAV file, and the encoding used in DVD-Video often has compression ratios approaching 100:1 (a four-hour movie at a fairly low – 1024×768 – resolution, which would take up
3 bytes [24 bits] per pixel
x1024 pixels per row
x768 rows per frame
x24 frames per second
x60 seconds per minute
x60 minutes per hour
x4 hours
=815,372,697,600 bytes [815.373 gigabytes, or 759.375 gibibytes]
uncompressed, fits on an 8.5-gigabyte DVD in compressed form).
…oops, didn’t see the page full of older comments until just now! :-S
>Or the ongoing hdd capacity scam.
What scam? “Giga” is, and has always been, a decimal prefix. Giga=1000^3. The hard drive makers are using it correctly here. If you want to talk about 1024^3, the correct prefix is “gibi”.
Just because the computer pioneers wrongly used decimal prefixes to refer to binary quantities doesn’t mean that we need to keep misusing them today. It doesn’t even make it a good idea to keep using decimal prefixes instead of binary.
Linux already uses the binary prefixes; hopefully, sometime soon, Microsoft and the RAM makers will see the writing on the wall and switch to using decimal prefixes solely for decimal quantities (such as drive size) and solely binary prefixes for binary quantities (such as RAM size).
Although, to be fair, even that’s not as bad as how the floppy disk makers abused the “mega” prefix (a floppy-manufacturer’s megabyte is neither a megabyte [1,000,000 bytes] nor a mebibyte [1,048,576 bytes], but rather the geometric mean of the two [1,024,000 bytes]! A prime example of why the golden-mean fallacy is a fallacy…).
“Giga is and always has been a decimal prefix” if one is willing to pretend that decades of common usage in the field of computing never happened. You can go read for example Intel’s current manuals pretending that “GB” means 1,000,000,000 bytes. But things just won’t add up. You can argue that it was always wrong (which, lacking a time machine, is utterly pointless), but you can’t argue that 1GB RAM is 1,000,000,000 bytes.
Gibi and the other prefixes indicating that the value is a power of 1024 did not exist until 1998 and probably took a few years for anyone to notice. My first programs were in the 70s; I am a bit too set in my ways to bother with the new definitions. Not that it mattered back then since paper tape was unwieldy and prone to rip long before the kilobyte capacity could be reached. Even in the promised land of disks, no one would care about decimal versus binary since the storage was only reported as blocks or records not bytes or words.
It is a good thing that none of the standards people who are concerned with accurate and exact numeric labels were around in the cassette storage era. At least four different systems called their transfer rate 1500 bps; none gave the exact same transfer rate as the others and none would give the exact same transfer rate on all data.
>You can argue that it was always wrong (which, lacking a time machine, is utterly pointless)
Pointless as regards back then, sure, but not at all pointless now; it’s (usually) easier to convince people to switch over if they can be convinced that what they were previously using was wrong or erroneous in some way (such as decimal prefixes being used to refer to binary quantities), rather than it just being someone wanting to change an old standard to a new one.
>but you can’t argue that 1GB RAM is 1,000,000,000 bytes.
Which is exactly why the RAM makers should start labelling their products in GiB, rather than GB.
I can understand the desire to remove ambiguity but to some extent it’s a solution in search of a problem. No one is confused by a 4GB memory stick, but I have a feeling that more than a few potential buyers will wonder what the heck 4GiB is, and if it’s more or less than 4GB.
@zeurkous
For editing etc, using uncompressed audio is beneficial, and on typical modern systems used for editing, storage space isn’t a concern when compared to the size of audio files.
But one of the very significant uses of digital audio files is portable audio, and the ability to carry your music collection with you wherever you go. As storage space on such devices is limited, compression provides very significant benefits there.
Yes, you could solve that by only having a part of your collection on a mobile device, and changing which part on a regular basis, but, thats not the same as having everything on the device, and why would anyone go for such a complicated workaround when its simply not needed at all.
As to your ‘text’ comments for storing audio and video data, take a look at what the IFF format for the Amiga did and why. Having a container format in which you can combine different kinds of data, and which has metadata to identify what kind of data is in each tagged block may seem more complicated than needed, but provides very significant benefits. The idea to just use the first few bytes in a file to identify the data in that file is used a lot, but wrought with problems as its fairly easy to end up with some raw dump of binary data which gets misidentified as some specific format. Its the way things are in the unix world, and its certainly better than using ‘filename extensions’, but its not actually a very good solution. Any real solution should at the very least contain an identifier denoting the container format and some metadata concerning the contained data, and of course the contained data itself. This metadata should contain enough information to read and check the validity of the contained data, which for audio includes things like bitrate, sample size and number of channels.
[again, only noticed Bart’s response just now]
Me’s essentially proposing the same thing, but w/ character granularity,
as opposed to the chunk granularity of IFF. (Meponderd IFF thoroughly
before coming up w/ me approach, so no surprises there :). Combined with
a terminal pipe that multiplexes on the character level, this solves all
the problems of integrating modern media into the traditional stream of
text (which me’d very much argue should be kept, for the sake of keeping
simple things simple), resulting in a massive simplification of the
system. A ton of special interfaces can be effectively thrown out.
Don’t tell me that’s not an improvement .
This site uses Akismet to reduce spam. Learn how your comment data is processed.