<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SIGPIPE 13</title>
	<atom:link href="http://sigpipe.macromates.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://sigpipe.macromates.com</link>
	<description>Programming and using OS X</description>
	<lastBuildDate>Fri, 10 Aug 2012 22:49:34 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Path Completion (bash)</title>
		<link>http://sigpipe.macromates.com/2012/08/10/path-completion-bash/</link>
		<comments>http://sigpipe.macromates.com/2012/08/10/path-completion-bash/#comments</comments>
		<pubDate>Fri, 10 Aug 2012 21:13:37 +0000</pubDate>
		<dc:creator>Allan Odgaard</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sigpipe.macromates.com/?p=92</guid>
		<description><![CDATA[If you upgraded to Mountain Lion and often want to cd into ~/Library/Application Support you might be a little annoyed by the new Application Scripts directory that makes the normal &#034;~/Library/Ap⇥stop at~/Library/Application S‸` to have you disambiguate the path. To avoid this you can set the FIGNORE variable. From man bash: FIGNORE A colon-separated list [...]]]></description>
				<content:encoded><![CDATA[<p>If you upgraded to Mountain Lion and often want to <code>cd</code> into <code>~/Library/Application Support</code> you might be a little annoyed by the new <code>Application Scripts</code> directory that makes the normal &#034;~/Library/Ap⇥<code>stop at</code>~/Library/Application S‸` to have you disambiguate the path.</p>

<p>To avoid this you can set the <code>FIGNORE</code> variable. From <code>man bash</code>:</p>

<pre><code>FIGNORE
    A colon-separated list of suffixes to ignore when 
    performing filename completion (see READLINE below). A 
    filename whose suffix matches one of the entries in 
    FIGNORE is excluded from the list of matched file- 
    names. A sample value is ".o:~".
</code></pre>

<p>So if you set this in your bash startup file:</p>

<pre><code>FIGNORE=".o:~:Application Scripts"
</code></pre>

<p>Then it will completely ignore that folder and do the full expansion.</p>

<p>Some other useful variables you can set in <code>~/.inputrc</code> that (IMHO) improve the default behavior of filename completion:</p>

<pre><code>completion-ignore-case (Off)
    If set to On, readline performs filename matching and 
    completion in a case-insensitive fashion.

mark-symlinked-directories (Off)
    If set to On, completed names which are symbolic links 
    to directories have a slash appended (subject to the 
    value of mark-directories).

show-all-if-ambiguous (Off)
    This alters the default behavior of the completion 
    functions. If set to On, words which have more than one 
    possible completion cause the matches to be listed 
    immediately instead of ringing the bell.
</code></pre>

<p>So my recommendation is to go with this:</p>

<pre><code>set completion-ignore-case on
set mark-symlinked-directories on
set show-all-if-ambiguous on
</code></pre>

<p>The ignore case allows you to type <code>~/l⇥</code> and still get <code>~/Library/</code>.</p>

<p>Marking symlinked directories is useful for <code>/tmp</code>, <code>/etc</code>, and <code>/var</code>.</p>

<p>Showing all when ambiguous instead of ringing the bell… who came up with these defaults?</p>
]]></content:encoded>
			<wfw:commentRss>http://sigpipe.macromates.com/2012/08/10/path-completion-bash/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Beating Binary Search</title>
		<link>http://sigpipe.macromates.com/2010/06/17/beating-binary-search/</link>
		<comments>http://sigpipe.macromates.com/2010/06/17/beating-binary-search/#comments</comments>
		<pubDate>Thu, 17 Jun 2010 18:44:02 +0000</pubDate>
		<dc:creator>Allan Odgaard</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sigpipe.macromates.com/2010/06/17/beating-binary-search/</guid>
		<description><![CDATA[Jay from LinkedIn’s SNA team writes: Quick, what is the fastest way to search a sorted array? Binary search, right? Wrong. There is actually a method called interpolation search]]></description>
				<content:encoded><![CDATA[<p>Jay from LinkedIn’s SNA team <a href="http://sna-projects.com/blog/2010/06/beating-binary-search/">writes</a>:</p>

<blockquote>
  <p>Quick, what is the fastest way to search a sorted array?</p>
  
  <p>Binary search, right?</p>
  
  <p>Wrong. There is actually a method called <a href="http://en.wikipedia.org/wiki/Interpolation_search">interpolation search</a></p>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://sigpipe.macromates.com/2010/06/17/beating-binary-search/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Accessing Protected Data</title>
		<link>http://sigpipe.macromates.com/2010/05/06/accessing-protected-data/</link>
		<comments>http://sigpipe.macromates.com/2010/05/06/accessing-protected-data/#comments</comments>
		<pubDate>Thu, 06 May 2010 08:41:37 +0000</pubDate>
		<dc:creator>Allan Odgaard</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sigpipe.macromates.com/2010/05/06/accessing-protected-data/</guid>
		<description><![CDATA[Whenever I see something that intrigues me, my mind makes a note of it and then subconsciously works toward finding a use-case for my newfound knowledge. An example is that I recently learned how protected member data (C++) is actually not safe from outside pryers (even in clean code that does not use typecasts). Given [...]]]></description>
				<content:encoded><![CDATA[<p>Whenever I see something that intrigues me, my mind makes a note of it and then subconsciously works toward finding a use-case for my newfound knowledge.</p>

<p>An example is that I recently learned how protected member data (C++) is actually not safe from outside pryers (even in clean code that does not use typecasts).</p>

<p><span id="more-53"></span>
Given a base class:</p>

<pre><code>class Base
{
protected:
    int foo () { return 42; }
};
</code></pre>

<p>We can create a new derived class which changes the visibility of the <code>foo</code> member function to public like this:</p>

<pre><code>class Derived : Base
{
public:
    using Base::foo;
};
</code></pre>

<p>This is not new, perhaps with the exception of the <code>using</code> keyword. This is normally used with <a href="http://www.parashift.com/c++-faq-lite/private-inheritance.html#faq-24.2">private inheritance</a> where one selectively expose member functions from the private base class.</p>

<p>The trick is that via <code>Derived</code> we can now obtain a pointer to the previously protected member function (<code>foo</code>) outside of the class:</p>

<pre><code>int(Base::*fn)() = &amp;Derived::foo;
</code></pre>

<p>The type syntax for (member) functions is arcane, but notice that even though we go through <code>Derived</code> to get the pointer, the actual type of the pointer has it as a member function of <code>Base</code> since <code>Derived</code> doesn’t redeclare the function, it simply re-expose it (via <code>using</code>).</p>

<p>So <code>fn</code> can be used directly with <code>Base</code> objects via the syntax for calling member functions given a pointer to them (the <code>.*</code> and <code>-&gt;*</code> operators):</p>

<pre><code>Base obj;
printf("%d\n", (obj.*fn)());
</code></pre>

<p>Or without using a variable to hold the member function pointer:</p>

<pre><code>Base obj;
printf("%d\n", (obj.*&amp;Derived::foo)());
</code></pre>

<p>Eureka!</p>

<h2><strike>Unit</strike> Tests</h2>

<p>Generally I write <strike>unit</strike> tests only for public API, my reasons for this are many:</p>

<ul>
<li><strike>Unit</strike> tests for me is to a big degree a way of “documenting” and ensuring simplicity of my APIs. </li>
<li>There are too many private functions, writing unit tests for these is a waste of time as they are both simple and using assertions.</li>
<li>Private functions are those which change regularly, and I don’t want to be discouraged from refactoring because of the double work in also updating unit tests.</li>
</ul>

<p>You may wonder what public API exists in something like a desktop application. What I do is write a module/library/framework whenever I have related functionality. For example TextMate 2 is presently built from 35 libraries. Each library expose types or functions related to a particular thing and that is the public API I write the tests for.</p>

<p>But back to why I need to access protected member data when I only test the public API. The reason for this is that some public types have private callbacks normally called by the OS, for example when a file changes on disk, the type for a document will have a private (now protected) callback invoked due to use of <code>kqueue</code>. Exactly when the callback is invoked is undefined which isn’t ideal for a <strike>unit</strike> test, so I have to cheat and call it myself, and that is why I need to access protected member data.</p>

<p>Sure, I could just make the callback a public function since there are less cases than I can count on one hand, but as indicated in the intro, my mind works overtime to apply the knowledge I accumulate ;)</p>

<p><strong>Update:</strong> Corrected my use of <em>‘unit tests’</em> as I am writing <em>‘high-level tests’</em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://sigpipe.macromates.com/2010/05/06/accessing-protected-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>GCC 4.5 &amp; C++0x</title>
		<link>http://sigpipe.macromates.com/2010/04/15/gcc-4-5-c0x/</link>
		<comments>http://sigpipe.macromates.com/2010/04/15/gcc-4-5-c0x/#comments</comments>
		<pubDate>Thu, 15 Apr 2010 11:27:05 +0000</pubDate>
		<dc:creator>Allan Odgaard</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sigpipe.macromates.com/2010/04/15/gcc-4-5-c0x/</guid>
		<description><![CDATA[GCC 4.5.0 is out and their progress on implementing C++0x features is coming along nicely. If you are on OS X and want to try it out you can install it via MacPorts: sudo port install gcc45 The binary installed is named g++-mp-4.5 and you must use the -std=c++0x argument to enable the new features. [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://gcc.gnu.org/gcc-4.5/">GCC 4.5.0</a> is out and their <a href="http://gcc.gnu.org/gcc-4.5/cxx0x_status.html">progress on implementing C++0x features</a> is coming along nicely.</p>

<p>If you are on OS X and want to try it out you can install it via <a href="http://www.macports.org/">MacPorts</a>:</p>

<pre><code>sudo port install gcc45
</code></pre>

<p>The binary installed is named <code>g++-mp-4.5</code> and you must use the <code>-std=c++0x</code> argument to enable the new features.</p>

<p>Of the supported C++0x features here are some of those that I find the most interesting (for my use of C++).</p>

<p><span id="more-50"></span></p>

<h2>Local and Unnamed Types as Template Arguments</h2>

<p>The most common scenario in which I need this is when declaring a local lookup structure that I need to iterate. I have my own set of <code>beginof</code>/<code>endof</code> functions overloaded for most types (something that will be redundant with C++0x but which GCC does not yet seem to provide), for example for the array overload I have:</p>

<pre><code>template &lt;typename T, int N&gt; T* beginof (T (&amp;a)[N]) { return a; }
template &lt;typename T, int N&gt; T* endof (T (&amp;a)[N])   { return a + N; }
</code></pre>

<p>This allows writing a generic <code>foreach</code> macro like this:</p>

<pre><code>#define foreach(i, c) \
   for(decltype(beginof(c)) i = beginof(c); i != endof(c); ++i)
</code></pre>

<p>I am using <code>decltype</code> which is another C++0x feature but prior to this there was the <code>typeof</code> GCC extension.</p>

<p>With the macro we can write code like this:</p>

<pre><code>int xs[] = { 1, 2, 3 };
foreach(x, xs)
    printf("%d\n", *x);
</code></pre>

<p>But prior to C++0x we would get an error for this code:</p>

<pre><code>struct { char const* name; int value; } values[] =
{
    { "foo", 1 },
    { "bar", 2 }
};

foreach(value, values)
    printf("%s\n", value-&gt;name);
</code></pre>

<p>The reason for the error is that <code>values</code> is both a local and unnamed type, and it is being passed as an argument to two template functions (<code>beginof</code>/<code>endof</code>).</p>

<p>But with C++0x this is now allowed!</p>

<h2>Initializer Lists</h2>

<p>Basically <code>std::initializer_list&lt;T&gt;</code> is the type given to “values in braces”. This means “values in braces” is now a type we can work with, e.g. receive as a constructor argument.</p>

<p>Looking at the code above, my local unnamed type was really a map. The reason why I would use a custom struct is mainly because I can declare the values in one go (w/o the overhead of calling functions). But now that “values in braces” has a type, <code>std::map</code> can be initialized from it:</p>

<pre><code>std::map&lt;std::string, int&gt; values =
{
    { "foo", 1 },
    { "bar", 2 }
};
</code></pre>

<h2>Type Inference</h2>

<p>If we continue with the example above we may want to search our <code>values</code> map using the <code>find</code> member function. The result of this is an iterator, the type of that is <code>std::map&lt;std::string, int&gt;::[const_]iterator</code>.</p>

<p>Starting with C++0x we can use <code>auto</code> instead, e.g.:</p>

<pre><code>auto foo = values.find("foo");
if(foo != values.end())
    printf("foo’s value is %d\n", foo-&gt;second);
</code></pre>

<p>Many advocate dynamic typing because they think static typing automatically require manifest typing. With the <code>auto</code> keyword and use of template functions, C++ is moving further and further away from that dreadful paradigm :)</p>

<h2>Lambda Functions</h2>

<p>This is probably what I am the most excited about but not sure how much I will actually use it.</p>

<p>It is however painful having to define a new function (outside current scope) whenever using a standard library algorithm that takes a function argument, especially since many of the algorithms are effectively just saving me the loop, e.g. <code>std::find_if</code> can be written in two lines with the actual comparison included in those two lines.</p>

<p>Following the style of this post, let me give an example of using <code>std::find_if</code> with a lambda:</p>

<pre><code>it = std::find_if(it, last, [](char ch){ return !isalnum(ch) &amp;&amp; ch != '_'; });
</code></pre>

<p>Here we advance the iterator (<code>it</code>) to skip alpha numeric characters and underscores.</p>

<p>The lambda can capture one or more variables from the current scope either by value or reference. This is declared inside the square brackets. Use <code>&amp;</code> to capture everything by reference, <code>=</code> to capture everything by value, or provide a list of variables that should be captured (with <code>&amp;</code> as prefix if by reference).</p>

<h2>Explicit Conversion</h2>

<p>One thing I love about C++ is its ability to do implicit conversions.</p>

<p>For example I can define this type:</p>

<pre><code>struct my_type_t
{
    my_type_t ()         : initialized(false)          { }
    my_type_t (size_t i) : initialized(true), value(i) { }

    operator bool () const { return initialized; }

    my_type_t operator+ (my_type_t const&amp; rhs) const
    {
        return my_type_t(value + rhs.value);
    }

private:
    bool initialized;
    size_t value;
};
</code></pre>

<p>And then this function:</p>

<pre><code>my_type_t foo (my_type_t const&amp; arg)
{
    if(!arg)
        abort();
    return arg + 8;
}
</code></pre>

<p>Here I rely on implicit construction of <code>my_type_t</code> from <code>8</code> but that will actually fail. The reason is that the compiler could also convert <code>arg</code> to <code>bool</code> (as we make use of in the <code>if</code>) and then add together a boolean and integer.</p>

<p>To avoid this problem we prefix the <code>operator bool</code> with <code>explicit</code> and can drop the alternative workaround for this problem.</p>

<p>Slightly related is the ability to delete functions. Say we are very strict about the API usage and we only want the user to construct <code>my_type_t</code> from <code>size_t</code> as opposed to <code>int</code>. The way to enforce this is to add the following constructor signature:</p>

<pre><code>my_type_t (int) = delete;
</code></pre>

<p>An alternative to <code>delete</code> is <code>default</code> which gives us the default implementation.</p>

<h2>Scoped Enumerations</h2>

<p>I often declare enumerations like this:</p>

<pre><code>namespace color { enum type { red, green, blue }; }
⋮
color::type c = color::red;
</code></pre>

<p>This however is not possible with enumerations declared inside a class (as we can’t nest a namespace inside a class). This menas the enumeration constants are declared in the scope of the class which can cause a problem, e.g.:</p>

<pre><code>class consumer_t
{
    enum state_t { active, done } state;
public:
    bool done () const; // error: we already declared ‘done’
};
</code></pre>

<p>For this reason I have changed my enumeration convention to:</p>

<pre><code>enum state_t { kActive, kDone };
</code></pre>

<p>While this avoids most clashes the constants are still exported into too big a scope. C++0x has a new <code>enum class</code> that avoids this:</p>

<pre><code>class consumer_t
{
    enum class state_t { active, done } state;
public:
    bool done () const { return state == state_t::done; }
};
</code></pre>

<h2>Closing Words</h2>

<p>There is still lots of cool stuff to come: range-based for, delegating/inheriting constructors, extensible literals, move semantics, all the stuff about threading, etc.</p>

<p>Unfortunately if you want to develop for Cocoa then you are out of luck, since Apple’s fork of GCC is not going to incorporate these improvements due to them being licensed under the latest version of the GPL.</p>

<p>I have not looked into building for Cocoa with the GCC included with MacPorts. If you have successful experience with that, let me know!</p>
]]></content:encoded>
			<wfw:commentRss>http://sigpipe.macromates.com/2010/04/15/gcc-4-5-c0x/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parallel BZip2</title>
		<link>http://sigpipe.macromates.com/2010/04/01/parallel-bzip2/</link>
		<comments>http://sigpipe.macromates.com/2010/04/01/parallel-bzip2/#comments</comments>
		<pubDate>Thu, 01 Apr 2010 09:39:11 +0000</pubDate>
		<dc:creator>Allan Odgaard</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sigpipe.macromates.com/2010/04/01/parallel-bzip2/</guid>
		<description><![CDATA[I ran some benchmarks which included PBZip2, a multi-threaded implementation of BZip2 (which is slow yet effective, so my preferred choice of compressor for basically everything). Running the Burrows–Wheeler transform over the input blocks is a task well suited for being parallelized and the benchmarks show that Jeff Gilchrist did a great job at this: [...]]]></description>
				<content:encoded><![CDATA[<p>I ran some benchmarks which included <a href="http://compression.ca/pbzip2/" title="Parallel BZIP2 (PBZIP2)">PBZip2</a>, a multi-threaded implementation of BZip2 (which is slow yet effective, so my preferred choice of compressor for basically everything).</p>

<p>Running the <a href="http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform">Burrows–Wheeler transform</a> over the input blocks is a task well suited for being parallelized and the benchmarks show that Jeff Gilchrist did a great job at this:</p>

<table class="graybox" border="0" cellspacing="0" cellpadding="5">
    <tr><th>Compressor</th><th>Time</th>                  <th>Archive Size</th></tr>
    <tr><td>None (cat)</td><td align="right"> 2.3s</td>  <td align="right">50 MB</td></tr>
    <tr><td>GZip</td>      <td align="right"> 4.0s</td>  <td align="right">34 MB</td></tr>
    <tr><td>BZip2</td>     <td align="right">16.3s</td>  <td align="right">29 MB</td></tr>
    <tr><td><a href="http://compression.ca/pbzip2/" title="Parallel BZIP2 (PBZIP2)">PBZip2</a></td>
                           <td align="right"> 3.0s</td>  <td align="right">29 MB</td></tr>
    <tr><td><a href="http://www.nongnu.org/lzip/lzip.html" title="Lzip - A lossless data compressor based on the LZMA algorithm">LZip</a></td>
                           <td align="right">41.8s</td>  <td align="right">24 MB</td></tr>
</table>

<p>The timings were produced by running the code below 4 times and taking the average of the last 3 runs (for each compressor).</p>

<p>This was executed on a 2 × 2.8 GHz Quad Core Mac Pro where <code>PBZip2</code> (correctly) auto-detected 8 cores.</p>

<p>I am running PBZip2 version 1.1.0 from MacPorts (<code>sudo port install pbzip2</code>).</p>

<pre><code>for Z in cat gzip bzip2 pbzip2 lzip; do
   time tar -cf "${Z}.res" --use-compress-prog="${Z}" Avian
done
</code></pre>

<p><strong>Update:</strong> Added test with <a href="http://www.nongnu.org/lzip/lzip.html" title="A lossless data compressor based on the LZMA algorithm">LZip</a> (an LZMA based compresser). There is a multi-threaded implementation of this (<a href="http://www.nongnu.org/lzip/plzip.html" title="A massively parallel lossless data compressor based on the LZMA algorithm"><code>plzip</code></a>) but a quick <code>./configure &amp;&amp; make</code> did not cut it.</p>
]]></content:encoded>
			<wfw:commentRss>http://sigpipe.macromates.com/2010/04/01/parallel-bzip2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Search Path for CD</title>
		<link>http://sigpipe.macromates.com/2010/03/28/search-path-for-cd/</link>
		<comments>http://sigpipe.macromates.com/2010/03/28/search-path-for-cd/#comments</comments>
		<pubDate>Sun, 28 Mar 2010 20:03:00 +0000</pubDate>
		<dc:creator>Allan Odgaard</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sigpipe.macromates.com/2010/03/28/search-path-for-cd/</guid>
		<description><![CDATA[I just learned this neat thing about the cd shell command: The variable CDPATH defines the search path for the directory containing «dir». Alternative directory names in CDPATH are separated by a colon (:). A null directory name is the same as the current directory. If «dir» begins with a slash (/), then CDPATH is [...]]]></description>
				<content:encoded><![CDATA[<p>I just learned this neat thing about the <code>cd</code> shell command:</p>

<blockquote>
  <p>The variable <code>CDPATH</code> defines the search path for the directory containing <em>«dir»</em>. Alternative directory names in <code>CDPATH</code> are separated by a colon (<code>:</code>). A null directory name is the same as the current directory. If <em>«dir»</em> begins with a slash (<code>/</code>), then <code>CDPATH</code> is not used.</p>
</blockquote>

<p>For example:</p>

<pre><code>% export CDPATH=$HOME/Source:$HOME/Library/Application\ Support/TextMate
% cd Avian/
/Users/duff/Source/Avian
% cd Bundles/
/Users/duff/Library/Application Support/TextMate/Bundles
% cd Support/lib/
/Users/duff/Library/Application Support/TextMate/Support/lib
% cd Avian/Frameworks/
/Users/duff/Source/Avian/Frameworks
</code></pre>

<p>This works with tab completion (using bash 4.1.2) so regardless of the current directory, I can generally do <code>cd Av⇥↩</code> to reach <code>~/Source/Avian</code>.</p>
]]></content:encoded>
			<wfw:commentRss>http://sigpipe.macromates.com/2010/03/28/search-path-for-cd/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Build Automation Part 2</title>
		<link>http://sigpipe.macromates.com/2010/01/23/build-automation-part-2/</link>
		<comments>http://sigpipe.macromates.com/2010/01/23/build-automation-part-2/#comments</comments>
		<pubDate>Sat, 23 Jan 2010 18:00:36 +0000</pubDate>
		<dc:creator>Allan Odgaard</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sigpipe.macromates.com/2010/01/23/build-automation-part-2/</guid>
		<description><![CDATA[This is part 2 of what I think will end up as four parts. This might be a bit of a rehash of the first part, but I skimmed lightly over why it actually is that I am so fond of make compared to most other build systems, so I will elaborate with some examples. [...]]]></description>
				<content:encoded><![CDATA[<p>This is part 2 of what I think will end up as four parts. This might be a bit of a rehash of the <a href="http://sigpipe.macromates.com/2010/01/15/build-automation-part-1/">first part</a>, but I skimmed lightly over why it actually is that I am so fond of <code>make</code> compared to most other build systems, so I will elaborate with some examples.</p>

<p>Part 3 will be a general post about declarative systems, not directly related to build automation. Part 4 should be about auto-generating the make files (which is part of the motivation for writing about declarative systems first).</p>

<p><span id="more-41"></span></p>

<h2>Fundamentals</h2>

<p>The original “insight” of <code>make</code> is that whatever we want executed can be considered a goal and:</p>

<ol>
<li>Each goal is represented by exactly one file.</li>
<li>Each dependency of a goal is itself a goal.</li>
<li>A goal is outdated when the represented file does not exist or is older than at least one of its depenencies.</li>
<li>A goal can be brought up-to-date by one or more shell commands.</li>
</ol>

<p>This is all there is to it. By linking the goals (via depenencies) we get the aforementioned <a href="http://en.wikipedia.org/wiki/Directed_acyclic_graph">DAG</a>, and with this simple data structure we can model all our processes as long as the four criteria above are met, which they generally are, at least on unix where “everything is a file” :)</p>

<h2>Extending the Graph</h2>

<p>One of the reasons I like to view the process as a directed graph is that it becomes easy to see how we need to “patch” it to add our own actions. Yes, I said patch, because we can actually do that, and quite easily, even if we can’t edit the original make file.</p>

<p>Imagine we are building <a href="http://wiki.videolan.org/Lunettes">Lunettes</a> (a new UI for the <a href="http://www.videolan.org/vlc/">VLC media player</a>) which depends on <a href="http://wiki.videolan.org/VLCKit">VLCKit</a>.</p>

<p>Considering the graph there must be some goal of Lunettes that depend on the VLCKit, in Makefile syntax this could simply be:</p>

<pre><code>APP_DST=Lunettes.app/Contents

$(APP_DST)/MacOS/Lunettes: $(APP_DST)/Frameworks/VLCKit.framework
</code></pre>

<p>This syntax establish a connection (dependency) between the executable and the framework. Here I made it depend on the framework’s root directory, of course it should depend on the actual binary in the framework (but then my box will overflow).</p>

<p>What this means is that each time the framework is updated, the executable is considered out-of-date and as a result, will be relinked (with the updated framework).</p>

<h3>Unit Tests</h3>

<p>The reason I mentioned the above link between the application and its framework is because this is where we want to insert new nodes (goals) in the graph incase we want to add unit tests to the VLCKit framework.</p>

<p>So the scenario is this: We write a bunch of unit tests for the VLCKit framework and we want these to run every single time the framework is updated, not only when we feel like it, but at the same time, since we probably spend most time developing on the application itself, we do not want the tests to run each time we do a build.</p>

<p>What we do is mind-boggling simple, we introduce a file to represent the unit test goal and we touch this each time the test has been successfully run:</p>

<pre><code>vlckit_test: $(APP_DST)/Frameworks/VLCKit.framework
    if «run test»; then touch '$@'; else false; fi
</code></pre>

<p>We can now <code>make vlckit_test</code> to run the test, and if the test has been run (succesfully) after last build of the framework, then it will just tell us that the goal is up-to-date.</p>

<p>To avoid running this manually, we add the following to our make file:</p>

<pre><code>$(APP_DST)/MacOS/Lunettes: vlckit_test
</code></pre>

<p>Now our application depends on having succesfully run the unit test for the used framework.</p>

<p>This is all done without touching any of the existing build files, we simply <strong>extend</strong> the build graph with our new actions.</p>

<p>And the result is IMO beautiful in the sense that the unit tests are only run when we actually change the framework, and failed unit tests will cause the entire build to fail.</p>

<p>As a reader exercise, go download the actual build files of the Lunettes / VLCKit project (much of it is in Xcode) and add something similar. What you will end up with is Xcode’s answer to the problem of extensibility: “custom shell script target” which will run every single time you re-build your target, regardless of whether or not there actually is a need for it.</p>

<p>This might be ok if you only have one thing that falls outside what the system was designed to handle, but when you have half a dozen of these…</p>

<h3>Build Numbers</h3>

<p>Another common build action these days is automated build numbers. Say we are going to do nightly builds of Lunettes and want to put the git revision into the <code>CFBundleVersion</code>.</p>

<p>You remember how everything is a file on unix? To my great delight, git conforms quite well to this paradigm and we can find the current revision as <code>.git/HEAD</code>, although this file contains a reference to the symbolic head which likely is <code>.git/refs/heads/master</code>.</p>

<p>For simplicity let us just assume we always stay on master (and we don’t create packs for the heads). The file is updated each time we make a commit, bumping its date, so all we need to do is have our <code>Info.plist</code> depend on <code>.git/refs/heads/master</code> and let the action to bring <code>Info.plist</code> up-to-date insert the current revision as value for the <code>CFBundleVersion</code> key.</p>

<p>Again make’s simple axiomatic system makes it a breeze to do this, and “do it right”, that is, do it in a way that limits computation to the theoretical minimal, rather than update the <code>Info.plist</code> with every single build or require it to be manually updated.</p>

<h3>External Dependencies</h3>

<p>I have used Lunettes as example in this post so let me continue and link to the <a href="http://wiki.github.com/pdherbemont/Glasses/how-to-build">build instructions</a>.</p>

<p>Here you see several steps you have to do in order to get a succesful build, additionally if you look in the <a href="http://github.com/pdherbemont/Glasses/tree/master/Frameworks/">frameworks directory of Lunettes</a> you’ll find that it deep-copied these from other projects.</p>

<p>Since every single person who wants to build this has to go through these steps, we should incorporate it in the build process, and it is actually quite simple (had this project been based on make files), for example we need to clone and build the VLC project which can be done using:</p>

<pre><code>vendor/vlc:
    git clone git://git.videolan.org/vlc.git '$@'
    $(MAKE) -sC '$@'
</code></pre>

<p>So if there is no <code>vendor/vlc</code> then we do a git checkout and call <code>make</code> afterwards. In theory we can also include the make file from this project so that we can do fine-grained dependencies, but since this is not our project we do not have control over its make file and can’t fix any potential clashes, so it’s safer to simply call <code>make</code> recursively on the checked out project.</p>

<p>We need to setup a link between Lunettes and <code>vendor/vlc</code> so that the checkout will actually be done (without having to <code>make vendor/vlc</code>), but that is just a single line in our make file.</p>

<h3>Other Actions</h3>

<p>If it isn’t clear by now, make files is what drives my own build process when I build TextMate. I run the build from TextMate itself, and the goal I ask to build is relaunching TextMate on a successful build.</p>

<p>This isn’t always desired, as I am actually using the application when it happens, so what I have done is rather simple and mimics the unit test injection shown above.</p>

<p>Let me start by quoting from my make file:</p>

<pre><code>$(APP_NAME)/run: ask_to_relaunch

ask_to_relaunch: $(APP_PATH)/Contents/MacOS/$(APP_NAME)
    @[[ $$("$$DIALOG" alert …|pl) = *"buttonClicked = 0"* ]]

.PHONY: ask_to_relaunch
</code></pre>

<p>This introduces a new goal (<code>ask_to_relaunch</code>), it is declared “phony” so it is not backed by a file on disk (and therefor, always considered outdated). It depens on the actual application binary, so it will never be updated before the application has been fully built.</p>

<p>I use phony goals like <code>«app»/run</code>, <code>«app»/debug</code> and similar. When I build from within TextMate it is the <code>«app»/run</code> goal that I build, and I have set this to depend on my (phony) <code>ask_to_relaunch</code> goal.</p>

<p>As this goal is always outdated, it will run the (shell) command to bring it up-to-date. The shell command opens a dialog (via the <code>"$DIALOG" alert</code> system) which asks whether or not to relaunch. If the user cancels the dialog, the shell command will return a non-zero return code and <code>make</code> will treat that as having failed updating the <code>ask_to_relaunch</code> goal which in turn will cause the <code>«app»/run</code> goal to never be updated (have its (shell) commands executed), as one of its dependencies failed.</p>

<p>Simple yet effective.</p>

<h2>Conclusion</h2>

<p>This has just been a bunch of examples, what I hope to have shown is how simple the basic concept of make is, how easy it is to extend an existing build process, and how flexibile make is in what it can actually do for us.</p>

<p>Of the many build systems I have looked at, I don’t see anything which has this simple axiomatic definition nor is actually very versatile. A lot of build systems have been created because make files are ugly/complex/arcane/etc., and I agree with that sentiment, but it seems like many of the replacements are systems hardcoded for specific purposes which simplify the boilerplate but make them inflexibile, or they are actual programming languages, which makes the build script only marginally better than a custom script, for example some, but not all, of the systems which takes the “programming language route” lack the ability to execute tasks in parallel, which, with 16 cores and counting, is a pretty fatal design limitation.</p>
]]></content:encoded>
			<wfw:commentRss>http://sigpipe.macromates.com/2010/01/23/build-automation-part-2/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Build Automation Part 1</title>
		<link>http://sigpipe.macromates.com/2010/01/15/build-automation-part-1/</link>
		<comments>http://sigpipe.macromates.com/2010/01/15/build-automation-part-1/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 22:05:54 +0000</pubDate>
		<dc:creator>Allan Odgaard</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sigpipe.macromates.com/2010/01/15/build-automation-part-1/</guid>
		<description><![CDATA[A blog post about Ant vs. Maven concludes that “the best build tool is the one you write yourself” and the Programmer Competency Matrix has “can setup a script to build the system” as requirement for reaching the higher levels in the “build automation” row. I have looked at a lot of build systems myself, [...]]]></description>
				<content:encoded><![CDATA[<p>A <a href="http://kent.spillner.org/blog/work/2009/11/14/java-build-tools.html">blog post about Ant vs. Maven</a> concludes that <em>“the best build tool is the one you write yourself”</em> and the <a href="http://www.indiangeek.net/wp-content/uploads/Programmer%20competency%20matrix.htm">Programmer Competency Matrix</a> has <em>“can setup a script to build the system”</em> as requirement for reaching the higher levels in the “build automation” row.</p>

<p>I have looked at a lot of build systems myself, and while I agree that the best build system is the one you create yourself I am also a big fan of <a href="http://www.gnu.org/software/make/manual/make.html"><code>make</code></a> and believe that the best approach is to use generated Makefiles.</p>

<p>This post is a “getting started with <code>make</code>”. I plan to follow up with a part 2 about how to handle auto-generated self-updating Makefiles.</p>

<p><span id="more-38"></span></p>

<h2>Concept</h2>

<p>The <a href="http://www.faqs.org/docs/artu/ch01s06.html">UNIX philosophy</a> is to have small tools (commands) which solve a well defined problem. These can then be combined to build more complex systems.</p>

<p>While each build process is different, the common denominator is that we should be able to represent our target(s) as nodes in a <a href="http://en.wikipedia.org/wiki/Directed_acyclic_graph" title="Directed Acyclic Graph">directed acyclic graph</a> where each node represents a file and each edge represents a dependency.</p>

<p>This is what a Makefile captures, i.e. a Makefile should be a <strong>declaration</strong> of the dependency graph with actions per node to create it if (the file it corresponds to on disk) is missing or older than its dependencies, i.e. the nodes we can reach from the (directed) edges.</p>

<p>By keeping the dependency information declarative we let <code>make</code> figure out which files are outdated and need to be rebuilt plus give it freedom to pick a strategy to rebuild files which may include running jobs in parallel.</p>

<h2>Example</h2>

<p>To give an example let us look at the <a href="http://github.com/andymatuschak/Sparkle/blob/master/generate_keys.rb"><code>generate_keys</code></a> script which is part of Sparkle and can generate a public and private key file.</p>

<p>The public key is extracted from the private key and the private key requires a DSA parameter file (we’ll ignore the <code>-genkey</code> flag to <code>dsaparam</code>).</p>

<p>So our (simple) graph looks like this:</p>

<pre><code>pubkey → privkey → dsa_parameters
</code></pre>

<p>A Makefile “rule” is effectively one node in our graph and looks like:</p>

<pre><code>«goal»: «dependencies»
    «action»
</code></pre>

<p>Here <code>«goal»</code> is the node itself, that is, the file it represents. The <code>«dependencies»</code> is the nodes it depends on and <code>«action»</code> is the command(s) to execute to generate/update the node/file (interpreted by the shell).</p>

<p>Using the <a href="http://github.com/andymatuschak/Sparkle/blob/master/generate_keys.rb"><code>generate_keys</code></a> script as source our Makefile ends up like this:</p>

<pre><code>pubkey: privkey
    openssl dsa -in '$&lt;' -pubout -out '$@'

privkey: dsa_parameters
    openssl gendsa '$&lt;' -out '$@'

dsa_parameters:
    openssl dsaparam 2048 &lt; /dev/urandom -out '$@'
</code></pre>

<p>In the above I have used two variables. The variable <code>$@</code> expands to the goal (i.e. the file we are generating) and <code>$&lt;</code> expands to the first dependency.</p>

<p>If you save the above as <code>Makefile</code> and run <code>make</code> then it will generate 3 files: <code>pubkey</code>, <code>privkey</code>, and <code>dsa_parameters</code>. By default calling <code>make</code> without arguments will ensure the first goal in the Makefile is up to date. If you re-run <code>make</code> it should say:</p>

<pre><code>make: `pubkey' is up to date.
</code></pre>

<p>You can also run <code>make privkey</code> to ensure (only) <code>privkey</code> is up to date (which then won’t extract the public key).</p>

<h2>Intermediate Files</h2>

<p>The above Makefile reproduce the script except that we are not removing the temporary <code>dsa_parameters</code> file after having generated the keys. We can fix this by making <code>dsa_parameters</code> a dependency of the fake <code>.INTERMEDIATE</code> goal by adding this line:</p>

<pre><code>.INTERMEDIATE: dsa_parameters
</code></pre>

<p>If we now run <code>make</code> it will automatically remove the <code>dsa_parameters</code> file after it has been used.</p>

<p>We probably want to use our public key from C so let us add another goal (node) namely <code>pubkey.h</code>. This goal will create a C header from the <code>pubkey</code> file, so it will depend on it. This goal can be handled by adding the following rule:</p>

<pre><code>pubkey.h: pubkey
    { echo 'static char const* pubkey ='; \
      sed &lt; '$&lt;' -e $$'s/.*/\t"&amp;\\\\n"/'; \
      echo ';'; } &gt; '$@'
</code></pre>

<p>Perhaps not the nicest way to generate the <code>pubkey.h</code> file but what is nice about this is that whatever application needs to use this header can declare it as a dependency, and it will be generated when needed, including extracting the public key if not already done.</p>

<h2>Includes</h2>

<p>To keep things modular we can save our Makefile as <code>Makefile.keys</code> and include it from our main Makefile using:</p>

<pre><code>include Makefile.keys
</code></pre>

<p>If we go back to the Sparkle distribution there is also a <code>sign_update</code> script which signs an update using the private key.</p>

<p>We can add this as another goal to our Makefile, e.g. using:</p>

<pre><code>archive.sig: privkey archive.tbz
    openssl dgst -dss1 -sign privkey archive.tbz
</code></pre>

<p>Here the archive signature depends on both having a private key and an archive. The private key will be generated if not already there, the archive we of course need to add another goal to create. The archive goal will depend on our actual binary which will depend on its object files which will depend on the sources (where one source is likely going to depend on <code>pubkey.h</code>).</p>

<h2>Phony Targets</h2>

<p>In addition we probably want to add another goal to construct an RSS feed (or similar) which include the archive signature and eventually we will want a deploy goal which will depend on the RSS feed and the archive. The action for this goal will likely be using <code>scp</code> to copy the files to the server and the goal itself will not be a file, i.e. when we run <code>make deploy</code> we do not expect an actual <code>deploy</code> file to be generated. While there is little harm in declaring a goal with actions that do not generate the file, we could risk getting a:</p>

<pre><code>make: `deploy' is up to date.
</code></pre>

<p>If there actually is a <code>deploy</code> file which is newer then the dependencies of the <code>deploy</code> goal. To avoid this we make the fake goal named <code>.PHONY</code> depend on <code>deploy</code> similar to what we did with the <code>.INTERMEDIATE</code> goal:</p>

<pre><code>.PHONY: deploy
</code></pre>

<h2>Closing Words</h2>

<p>This post is just a mild introduction to <code>make</code>. I have deliberately picked something that does not involve building C sources as the example to show that <code>make</code> is a versatile tool.</p>

<p>Whenever you have a set of actions that need to be run in a specific order then consider if a Makefile can capture the dependency graph.</p>

<p>When you do write a Makefile aim for having a rule only do one thing. For example imagine we are writing a manual and store each chapter as Markdown. Rather than do something like this:</p>

<pre><code>chapter.html: header.html chapter.mdown footer.html
    { cat header.html; \
      Markdown.pl &lt; chapter.mdown; \
      cat footer.html } &gt; '$@'
</code></pre>

<p>We can instead do:</p>

<pre><code>chapter.html: header.html cache/chapter.html footer.html
    cat &gt; '$@' $^

cache/chapter.html: chapter.mdown
    Markdown.pl &lt; '$&lt;' &gt; '$@'
</code></pre>

<p>The new <code>$^</code> variable expands to all the dependencies.</p>

<p>There are a few reasons to favor this approach. In this concrete example we have the advantage of not needing to pipe all the chapters through <code>Markdown.pl</code> if we change the header or footer. But in general it just makes things more flexible, easier to re-use goals, faster to restart a failed build, it may improve the number of jobs that can run in parallel, etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://sigpipe.macromates.com/2010/01/15/build-automation-part-1/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Self-balancing Trees</title>
		<link>http://sigpipe.macromates.com/2009/08/22/self-balancing-trees/</link>
		<comments>http://sigpipe.macromates.com/2009/08/22/self-balancing-trees/#comments</comments>
		<pubDate>Sat, 22 Aug 2009 20:02:48 +0000</pubDate>
		<dc:creator>Allan Odgaard</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sigpipe.macromates.com/2009/08/22/self-balancing-trees/</guid>
		<description><![CDATA[In a previous blog post I describe a data structure which require the use of a self-balancing binary search tree. Few need to implement their own self-balancing trees, but since two previous comments referred to AVL and red/black trees respectively, I should give a shout-out to Arne Andersson and his paper titled Binary Search Trees [...]]]></description>
				<content:encoded><![CDATA[<p>In a <a href="http://sigpipe.macromates.com/2009/08/13/maintaining-a-layout/">previous blog post</a> I describe a data structure which require the use of a <a href="http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree">self-balancing binary search tree</a>.</p>

<p><span id="more-37"></span>
Few need to implement their own self-balancing trees, but since two previous comments referred to AVL and red/black trees respectively, I should give a shout-out to <a href="http://user.it.uu.se/~arnea/">Arne Andersson</a> and his paper titled <a href="http://user.it.uu.se/~arnea/ps/simp.pdf">Binary Search Trees Made Simple</a> (PDF).</p>

<p>The paper introduces <a href="http://en.wikipedia.org/wiki/AA_tree">AA trees</a> which are simple to implement but understanding the logic for when to skew/rotate is not clear from the paper. Julienne Walker filled that hole with a great <a href="http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_andersson.aspx">tutorial about AA trees</a> and how they (<a href="http://en.wikipedia.org/wiki/Red-black_tree#Analogy_to_B-trees_of_order_4">like red/black trees</a>) stem from <a href="http://www.cs.ucr.edu/cs14/cs14_06win/slides/2-3_trees_covered.pdf" title="PDF Showing insert/delete for 2-3 trees">B-trees of order 3</a> (PDF).</p>
]]></content:encoded>
			<wfw:commentRss>http://sigpipe.macromates.com/2009/08/22/self-balancing-trees/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Cuckoo Hashing</title>
		<link>http://sigpipe.macromates.com/2009/08/18/cuckoo-hashing/</link>
		<comments>http://sigpipe.macromates.com/2009/08/18/cuckoo-hashing/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 20:23:55 +0000</pubDate>
		<dc:creator>Allan Odgaard</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sigpipe.macromates.com/2009/08/18/cuckoo-hashing/</guid>
		<description><![CDATA[The Achilles’ heel of hashing is collision: When we want to insert a new value into the hash table and the slot is already filled, we use a fallback strategy to find another slot, for example linear probing. The fallback strategy can affect lookup time since we need to do the same probing when a [...]]]></description>
				<content:encoded><![CDATA[<p>The Achilles’ heel of hashing is collision: When we want to insert a new value into the hash table and the slot is already filled, we use a fallback strategy to find another slot, for example <a href="http://en.wikipedia.org/wiki/Linear_probing">linear probing</a>.</p>

<p>The fallback strategy can affect lookup time since we need to do the same probing when a lookup results in an entry with wrong key, turning the nice <em>O(1)</em> time complexity into (worst case) <em>O(n)</em>.</p>

<p><span id="more-36"></span>
Of course the <em>O(n)</em> time is pessimistic as we will rehash to a larger table size when we reach a certain threshold, though from a theoretical point of view an intriguing approach to handling collisions is <a href="http://en.wikipedia.org/wiki/Cuckoo_hashing">cuckoo hashing</a> which guarantees <em>O(1)</em> lookup time (insertion can still be worse).</p>

<p>Quoting the <a href="http://en.wikipedia.org/wiki/Cuckoo_hashing">Wikipedia page</a>:</p>

<blockquote>
  <p>The basic idea is to use two hash functions instead of only one. This provides two possible locations in the hash table for each key.</p>
  
  <p>When a new key is inserted, a greedy algorithm is used: The new key is inserted in one of its two possible locations, “kicking out”, that is, displacing, any key that might already reside in this location. This displaced key is then inserted in its alternative location, again kicking out any key that might reside there, until a vacant position is found, or the procedure enters an infinite loop. In the latter case, the hash table is rebuilt in-place using new hash functions.</p>
</blockquote>

<p>This means that no additional probing is required during lookup as an element will always be in one of the two slots given by the hash functions (if it is in the table).</p>

<p>In practice linear probing with proper thresholds and a good hash function may perform better (due to locality of reference) plus insertion using cuckoo hashing can be more expensive (as we do more memory writes on collisions than linear probing), still, I love the theoretical property of this collision strategy :)</p>
]]></content:encoded>
			<wfw:commentRss>http://sigpipe.macromates.com/2009/08/18/cuckoo-hashing/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
