Programming, automation, algorithms, macOS, and more.

Parallel BZip2

I ran some benchmarks which included PBZip2, a multi-threaded implementation of BZip2 (which is slow yet effective, so my preferred choice of compressor for basically everything).

Running the Burrows–Wheeler transform over the input blocks is a task well suited for being parallelized and the benchmarks show that Jeff Gilchrist did a great job at this:

Compressor Time Archive Size
None (cat) 2.3s 50 MB
GZip 4.0s 34 MB
BZip2 16.3s 29 MB
PBZip2 3.0s 29 MB
LZip 41.8s 24 MB

The timings were produced by running the code below 4 times and taking the average of the last 3 runs (for each compressor).

This was executed on a 2 × 2.8 GHz Quad Core Mac Pro where PBZip2 (correctly) auto-detected 8 cores.

I am running PBZip2 version 1.1.0 from MacPorts (sudo port install pbzip2).

for Z in cat gzip bzip2 pbzip2 lzip; do
   time tar -cf "${Z}.res" --use-compress-prog="${Z}" Avian

Update: Added test with LZip (an LZMA based compresser). There is a multi-threaded implementation of this (plzip) but a quick ./configure && make did not cut it.

{{ numberOfCommentsTitle }}

{{ submitComment.success }}

Error Posting Comment

{{ submitComment.error }}