Parallel BZip2

April 1st, 2010

I ran some benchmarks which included PBZip2, a multi-threaded implementation of BZip2 (which is slow yet effective, so my preferred choice of compressor for basically everything).

Running the Burrows–Wheeler transform over the input blocks is a task well suited for being parallelized and the benchmarks show that Jeff Gilchrist did a great job at this:

CompressorTime Archive Size
None (cat) 2.3s 50 MB
GZip 4.0s 34 MB
BZip2 16.3s 29 MB
PBZip2 3.0s 29 MB
LZip 41.8s 24 MB

The timings were produced by running the code below 4 times and taking the average of the last 3 runs (for each compressor).

This was executed on a 2 × 2.8 GHz Quad Core Mac Pro where PBZip2 (correctly) auto-detected 8 cores.

I am running PBZip2 version 1.1.0 from MacPorts (sudo port install pbzip2).

for Z in cat gzip bzip2 pbzip2 lzip; do
   time tar -cf "${Z}.res" --use-compress-prog="${Z}" Avian

Update: Added test with LZip (an LZMA based compresser). There is a multi-threaded implementation of this (plzip) but a quick ./configure && make did not cut it.

[by Allan Odgaard]

3 Responses to “Parallel BZip2”

  1. Markus Says:
    April 1st, 2010 at 12:05

    Have you checked out lzma tools as well? given they are at the speed of gzip yet producing archives a bit smaller than or at least equal to bzip2…

  2. Allan Odgaard Says:
    April 2nd, 2010 at 00:15

    Markus: I added lzip to the test set. Seems relatively slow. Main reason for lack of adoption though is that this hasn’t yet become ubiquitous, perhaps the speed is the reason — their own page does repeat same claim as you about being as fast as gzip, but certainly not my experience. Maybe the 7-zip compressor is comparable to gzip.

  3. Allan Odgaard Says:
    April 2nd, 2010 at 08:19

    Addendum: I re-read their claim and it says decompresses as fast as gzip.

Leave a Reply