As a part of learning system programming, I am looking to implement a file shredder. The simplest way (and probably seen as naive) would be to replace the data bytes with zeroes (I know OS splits the files and I'll replace bytes in all those chunks). But when I google on this topic, I am surprised to find multiple pass algorithms, some going as high as 35!
Could someone elucidate the benefit of multiple pass please? I couldn't find any explanation.
Thanks
-
1You may wish to read this: stackoverflow.com/questions/4448772Blrfl– Blrfl2011年08月19日 12:17:59 +00:00Commented Aug 19, 2011 at 12:17
4 Answers 4
Imagine a physical disk storing the binary value 0101. Physically, on the disk, the charges exist as real values, which are rounded up or down by the disk controller
binary -> physical charge
0 1 0 1 -> 0.1 0.9 0.1 0.9
If you were to overwrite the data with zeros, some residual charge would remain from the previous values, so you could in this simple example, the new values being
binary -> physical
0 0 0 0 -> 0.01 0.09 0.01 0.09
Equipment that is sensitive enough to read these charges at high resolution, can then be used to extract this "shadow" of the overwritten data. That's why rewriting multiple times (and using random values) helps obscure the data.
-
-1, no it's not. We've been pusing the limits on disks so long that we've unambiguously entered the domain of quantum physics. This analog assumption just doesn't hold anymore. Each magnetic domain (grain) on a platter points in one direction, and only one. There are just a few hundred grains per bit at most, they're strongly coupled, and they're not at all cooled. Furthermore, the actual bits are transformed by a PRML and ECC function, so you can't even directly say to which bit an individual grain corresponds. Essentially, 1TB+ disks are possible because this residual is now fully used.MSalters– MSalters2011年08月19日 13:49:40 +00:00Commented Aug 19, 2011 at 13:49
-
3@MSalters - You are assuming that all disks in use are like this. WD Still makes disks that do not utilize this. The question was why use 35 passes. It is to obscure the data for the reasons shown. Until the old style drives are no longer in use then this type of destroyer is needed. What is missing is that new controlers do not give you the fine grain control over the hardware. Laws designed to prevent the destruction of evidence have lead to controlers that do not overwrite previously used areas until they have no other choice.SoylentGray– SoylentGray2011年08月19日 15:08:04 +00:00Commented Aug 19, 2011 at 15:08
-
4@MSalters, whether it's necessary is irrelevant. This is the correct answer to the question posed by the OP.Caleb– Caleb2011年08月19日 15:45:03 +00:00Commented Aug 19, 2011 at 15:45
-
@MSalters, yes the entire grain points in one direction, but the quantization axis may differ from grain to grain, inducing some variation. This would be affected by thermal fluctuations, magnetic fluctuations from the read head passing over, or a neighboring grain being flipped.rcollyer– rcollyer2011年08月19日 16:57:29 +00:00Commented Aug 19, 2011 at 16:57
-
@Chad: All magnetic materials have grains. Simple math proves that WD's disks use a few hundred grains per bit, given the size and capacity of their platters. You might be confused by patterned media. Those intentionally delineate grains to reduce coupling. Non-patterned media just have grains randomly distributed.MSalters– MSalters2011年08月22日 08:09:24 +00:00Commented Aug 22, 2011 at 8:09
The multipass erase is necessary to destroy data on magnetic storage devices. Data can be recovered with the right equipment even if it was overwritten by another sequence of 1s and 0s from the layers below or in between.
However, there are voices on the internet which claim that multipass erasure is no longer necessary, as the areal density of data on modern harddrives has increased 10 000 fold.
-
1You're welcome. Too bad you haven't got the vote-up privilege yet =)Falcon– Falcon2011年08月19日 09:59:02 +00:00Commented Aug 19, 2011 at 9:59
It is said that experts with special equipment can reconstruct a formatted drive. Therefore the advise is to overwrite the data on the drive multiple times with differing (random) patterns.
The overwriting of data with 0s in multiple passes only makes sense for magnetic storage devices, because of what @pufferfish said. For SSD and other flash storage mechanisms this fails, see http://www.usenix.org/events/fast11/tech/full_papers/Wei.pdf
Moral of the story: Dealing with hardware problem in software may change when hardware technology changes, although the API will not change.
Explore related questions
See similar questions with these tags.