Fuzzy Matching and Data Compression
Could data compression be used to detect similar files? Many data-compression algorithms work by detecting repeated patterns. If two similar files were concatenated, they could be compressed far more than if two unrelated files were concatenated.
1 Comments:
This sounds right.
I am not quite as sure that the concatenation of two unrelated files should compress to a file whose size is greater than the sum of sizes of the two files compressed individually. Depends o how expensive it is to encode the compression table.
Post a Comment
<< Home