SIGPIPE 13

Build Automation Part 1

January 15th, 2010

A blog post about Ant vs. Maven concludes that “the best build tool is the one you write yourself” and the Programmer Competency Matrix has “can setup a script to build the system” as requirement for reaching the higher levels in the “build automation” row.

I have looked at a lot of build systems myself, and while I agree that the best build system is the one you create yourself I am also a big fan of make and believe that the best approach is to use generated Makefiles.

This post is a “getting started with make”. I plan to follow up with a part 2 about how to handle auto-generated self-updating Makefiles.

Concept

The UNIX philosophy is to have small tools (commands) which solve a well defined problem. These can then be combined to build more complex systems.

While each build process is different, the common denominator is that we should be able to represent our target(s) as nodes in a directed acyclic graph where each node represents a file and each edge represents a dependency.

This is what a Makefile captures, i.e. a Makefile should be a declaration of the dependency graph with actions per node to create it if (the file it corresponds to on disk) is missing or older than its dependencies, i.e. the nodes we can reach from the (directed) edges.

By keeping the dependency information declarative we let make figure out which files are outdated and need to be rebuilt plus give it freedom to pick a strategy to rebuild files which may include running jobs in parallel.

Example

To give an example let us look at the generate_keys script which is part of Sparkle and can generate a public and private key file.

The public key is extracted from the private key and the private key requires a DSA parameter file (we’ll ignore the -genkey flag to dsaparam).

So our (simple) graph looks like this:

pubkey → privkey → dsa_parameters

A Makefile “rule” is effectively one node in our graph and looks like:

«goal»: «dependencies»
    «action»

Here «goal» is the node itself, that is, the file it represents. The «dependencies» is the nodes it depends on and «action» is the command(s) to execute to generate/update the node/file (interpreted by the shell).

Using the generate_keys script as source our Makefile ends up like this:

pubkey: privkey
    openssl dsa -in '$<' -pubout -out '$@'

privkey: dsa_parameters
    openssl gendsa '$<' -out '$@'

dsa_parameters:
    openssl dsaparam 2048 < /dev/urandom -out '$@'

In the above I have used two variables. The variable $@ expands to the goal (i.e. the file we are generating) and $< expands to the first dependency.

If you save the above as Makefile and run make then it will generate 3 files: pubkey, privkey, and dsa_parameters. By default calling make without arguments will ensure the first goal in the Makefile is up to date. If you re-run make it should say:

make: `pubkey' is up to date.

You can also run make privkey to ensure (only) privkey is up to date (which then won’t extract the public key).

Intermediate Files

The above Makefile reproduce the script except that we are not removing the temporary dsa_parameters file after having generated the keys. We can fix this by making dsa_parameters a dependency of the fake .INTERMEDIATE goal by adding this line:

.INTERMEDIATE: dsa_parameters

If we now run make it will automatically remove the dsa_parameters file after it has been used.

We probably want to use our public key from C so let us add another goal (node) namely pubkey.h. This goal will create a C header from the pubkey file, so it will depend on it. This goal can be handled by adding the following rule:

pubkey.h: pubkey
    { echo 'static char const* pubkey ='; \
      sed < '$<' -e $$'s/.*/\t"&\\\\n"/'; \
      echo ';'; } > '$@'

Perhaps not the nicest way to generate the pubkey.h file but what is nice about this is that whatever application needs to use this header can declare it as a dependency, and it will be generated when needed, including extracting the public key if not already done.

Includes

To keep things modular we can save our Makefile as Makefile.keys and include it from our main Makefile using:

include Makefile.keys

If we go back to the Sparkle distribution there is also a sign_update script which signs an update using the private key.

We can add this as another goal to our Makefile, e.g. using:

archive.sig: privkey archive.tbz
    openssl dgst -dss1 -sign privkey archive.tbz

Here the archive signature depends on both having a private key and an archive. The private key will be generated if not already there, the archive we of course need to add another goal to create. The archive goal will depend on our actual binary which will depend on its object files which will depend on the sources (where one source is likely going to depend on pubkey.h).

Phony Targets

In addition we probably want to add another goal to construct an RSS feed (or similar) which include the archive signature and eventually we will want a deploy goal which will depend on the RSS feed and the archive. The action for this goal will likely be using scp to copy the files to the server and the goal itself will not be a file, i.e. when we run make deploy we do not expect an actual deploy file to be generated. While there is little harm in declaring a goal with actions that do not generate the file, we could risk getting a:

make: `deploy' is up to date.

If there actually is a deploy file which is newer then the dependencies of the deploy goal. To avoid this we make the fake goal named .PHONY depend on deploy similar to what we did with the .INTERMEDIATE goal:

.PHONY: deploy

Closing Words

This post is just a mild introduction to make. I have deliberately picked something that does not involve building C sources as the example to show that make is a versatile tool.

Whenever you have a set of actions that need to be run in a specific order then consider if a Makefile can capture the dependency graph.

When you do write a Makefile aim for having a rule only do one thing. For example imagine we are writing a manual and store each chapter as Markdown. Rather than do something like this:

chapter.html: header.html chapter.mdown footer.html
    { cat header.html; \
      Markdown.pl < chapter.mdown; \
      cat footer.html } > '$@'

We can instead do:

chapter.html: header.html cache/chapter.html footer.html
    cat > '$@' $^

cache/chapter.html: chapter.mdown
    Markdown.pl < '$<' > '$@'

The new $^ variable expands to all the dependencies.

There are a few reasons to favor this approach. In this concrete example we have the advantage of not needing to pipe all the chapters through Markdown.pl if we change the header or footer. But in general it just makes things more flexible, easier to re-use goals, faster to restart a failed build, it may improve the number of jobs that can run in parallel, etc.

[by Allan Odgaard]


6 Responses to “Build Automation Part 1”

  1. Nico Says:
    January 22nd, 2010 at 16:49

    Have you seen http://code.google.com/p/gyp/ ? It generates xcodeproj, makefiles, and visual studio projects from its syntax. It's what chromium uses as its build system.

  2. Allan Odgaard Says:
    January 23rd, 2010 at 18:21

    Nico: I wrote part 2 sort of as reply to your comment.

    Basically the benefit with make is that it is a system for linking actions together and have make figure out when it is time to run which actions. The actions to run can be anything.

    I realize that calling these posts ‘Build Automation’ is probably a misnomer, because what this really is about is automation. Building projects I will talk about in part 4. Make itself is not a good abstraction for building projects when we know the domain but it is still a very good engine. So comparing gyp to make is akin to comparing an abstraction to an engine (although make itself offers an abstraction to the lower-level task of executing interdependent actions).

    As for gyp itself as a good abstraction for building projects: I don’t think so ;)

  3. hauk Says:
    May 24th, 2010 at 01:06

    Mandatory read when studying Make and Makefiles: http://miller.emu.id.au/pmiller/books/rmch/ Many, if not most build systems that use Make use recursive Makefiles. The paper linked show how this is a bad approach because it fragments the dependency graph and does not give Make the correct picture of the system to build.

  4. Vitaly Says:
    July 24th, 2010 at 10:44

    The other side of splitting a source tree into fragment is the fastest incremental builds within local folders which is important in everyday work. There is a very close alternative to GNU make – http://fastmake.org. It can make life with Make much easier. And, it is alive.

  5. Allan Odgaard Says:
    July 24th, 2010 at 12:13

    Vitaly: Thanks for the link to your project, it is always interesting to learn about new projects around build automation.

    I do however think that the main problem Fastmake tries to address (build system overhead) is more noticeable on Windows than on other systems.

    I base this on Mercurial, which I believe on Windows will use directory iterators instead of stat’ing each tracked file, since the latter is too slow with big repositories, but not a problem on Linux.

    Though I haven’t done my own tests. I have however found that on OS X, using scandir seems magnitudes faster than individual calls to stat, so there might be an advantage in running scandir recursively on the source directory if majority of files are believed to be part of the dependency tree.

    That said, my own project consists of more than 250 included Makefiles (one Makefile, auto-generated per source file, listing the source file’s dependencies), it has around the same number of sources and also a bunch of resources (also part of the dependency graph).

    The build when everything is up-to-date takes 0.1 second (warm cache).

    I should however mention that I am using an SSD drive, 12 GB of RAM, running 8 parallel build jobs, and has disabled make’s built-in rules (these are really slow).

    Btw: In your acknowledgements section, it would be nice to have links to the components you use. The nFS library proved particularly hard to find (in fact, I never did find the official page only an article in Google’s web cache from codeproject.com).

  6. Vitaly Says:
    July 26th, 2010 at 11:14

    Allan, as you suggested I described http://fastmake.org/acknowledgements.html thoroughly, thank you for visiting.

    As to scandir/readdir, POSIX guarantees only inode and name information out of these functions. I collided with this on MacOS Snow Leopard where I had to call 'stat' to obtain file type info. On Windows I call direct WINAPI call GetFileAttributesEx which is faster than stat.

    You are right that on Linux/MacOs subprocesses run much faster than on Windows. But speed is not the single problem. As we know the latest 3.81 version of GNU make was released in 2006. There are many complaints on GNU make functionality and syntax over Internet (e.g. http://www.conifersystems.com/whitepapers/gnu-make/) that are resolved in Fastmake. Being the closest to GNU make Fastmake can be treated as continuation of GNU make with alive feedback.


Leave a Reply