How Much Does a Zip File Compress?

By Todd Williams

The zip file format was developed by Pkware. Early development of the format was completed in 1986. It allows data to be compressed and archived, saving disk space and making it easier to back up and transfer large or multi-part files. There are many factors that affect the degree of compression that can be achieved.

Compression Types

There are two types of file compression: lossless and lossy. The zip format is an example of lossless compression, which means that the compressed data can be returned to its exact previous state without flaw. Lossy compression favors efficiency over accuracy, allowing data to be substituted or eliminated entirely. A lossless format does not permit compression methods that do not allow a perfect recreation of the original data, even if those methods would be more effective.

Methods

The zip format includes specifications for multiple algorithmic methods of file compression. The method used can have a significant effect on the level of compression achieved. Some of the methods available include shrink, reduce, implode and deflate. Of these, deflate is the most widely used. It is the default compression method use in Pkzip, WinZip, and Info-Zip.

Variability

The extent to which a file can be compressed varies by file type. For example, text files by default lack any compression, with each character being represented individually in hex. Hex, or hexadecimal, is a numeral system made up of 16 unique single-character identifiers capable of representing each byte in a file. Compression rates on this kind of file are generally very good.Many media formats, such as MP3s, include compression as part of the format standard. Compression rates on these file are usually poor, potentially even resulting in a "compressed" zip file that's larger than the original due to the additional data in the zip archive.

Comparison

The zip format is not the only compression format in use. Other common compression formats include RAR and 7-zip. According to Igor Pavlov, developer of 7-zip, the standard zip format underperforms the other two formats by as much as 30 to 40 percent, depending on the type of data being compressed.In a test, Pavlov compressed a full installation of Google Earth 3.0.0616. The data totaled 23.5 MB before compression. The standard zip format provided approximately 62 percent compression. By comparison, RAR resulted in a 71 percent compression rate, and 7-zip had 76 percent.

Limitations

Early incarnations of the zip format were limited to handling data no larger than 4 gigabytes at a time. This included compressed and uncompressed individual file size and the total size of the archive after compression. This limitation is removed in more recent versions of the format with the addition of the zip64 extension. Support for the zip64 extension is still limited.