SIGPIPE 13

Programming, automation, algorithms, macOS, and more.

Posted by Allan Odgaard

{{ numberOfComments('2018-09-28-creating-a-faster-jekyll') }}

Creating a Faster Jekyll

Jekyll is a static site generator which we recently adopted for most of https://macromates.com motivated by its nice design and large userbase.

We did however run into performance issues so we wrote a replacement which is semi-compatible with Jekyll but with better speed and some additional features

Read the rest of the post »


Run Command Every Other Week

I run a few things via cron, some of them need to run in intervals that cannot be expressed, for example biweekly or every 8th month.

As a general solution I created the every command available here.

The supported usage is:

every [-n number] command [argument ...]

This will run command every number time it’s invoked. For example to send an email the third, sixth, ninth, etc. time we call it, use:

every -n3 mail -s"Water the plants" me@example.org <<< "It’s time again!"

Using this in a crontab to remind us every second Wednesday could be done as:

#  m  h dom mon dow command
  00 12   *   * wed every -n2 mail -s"Water the plants" me@example.org <<< "It’s time again!"

How it Works

The command uses a guard file written to $XDG_DATA_HOME/every. If XDG_DATA_HOME is unset then it defaults to $HOME/.local/share.

The name of the guard file is derived from the arguments passed to every (using sha1) and the content of the guard file is a counter to keep track of how many times we have been called. As a convenience we also write the command to the guard file.

Once the counter reaches the value given via -n then every will remove the guard file and exec your command.

The command is implemented as a bash script and should work on both OS X and GNU/Linux.

Alternative Solution

If the external guard file is undesired or readability is not a concern, then an alternative approach is to use modular arithmetic with the UNIX epoch returned by date +%s. For an example see this post.


Posted by Allan Odgaard

{{ numberOfComments('2012-08-10-path-completion-bash') }}

Path Completion (bash)

If you upgraded to Mountain Lion and often want to cd into ~/Library/Application Support you might be a little annoyed by the new Application Scripts directory that makes the normal ``~/Library/Ap⇥ stop at ~/Library/Application S‸` to have you disambiguate the path.

To avoid this you can set the FIGNORE variable. From man bash:

FIGNORE
    A colon-separated list of suffixes to ignore when 
    performing filename completion (see READLINE below). A 
    filename whose suffix matches one of the entries in 
    FIGNORE is excluded from the list of matched file- 
    names. A sample value is ".o:~".

So if you set this in your bash startup file:

FIGNORE=".o:~:Application Scripts"

Then it will completely ignore that folder and do the full expansion.

Some other useful variables you can set in ~/.inputrc that (IMHO) improve the default behavior of filename completion:

completion-ignore-case (Off)
    If set to On, readline performs filename matching and 
    completion in a case-insensitive fashion.

mark-symlinked-directories (Off)
    If set to On, completed names which are symbolic links 
    to directories have a slash appended (subject to the 
    value of mark-directories).

show-all-if-ambiguous (Off)
    This alters the default behavior of the completion 
    functions. If set to On, words which have more than one 
    possible completion cause the matches to be listed 
    immediately instead of ringing the bell.

So my recommendation is to go with this:

set completion-ignore-case on
set mark-symlinked-directories on
set show-all-if-ambiguous on

The ignore case allows you to type ~/l⇥ and still get ~/Library/.

Marking symlinked directories is useful for /tmp, /etc, and /var.

Showing all when ambiguous instead of ringing the bell… who came up with these defaults?


Posted by Allan Odgaard

{{ numberOfComments('2010-06-17-beating-binary-search') }}

Beating Binary Search

Jay from LinkedIn’s SNA team writes:

Quick, what is the fastest way to search a sorted array?

Binary search, right?

Wrong. There is actually a method called interpolation search


Accessing Protected Data

Whenever I see something that intrigues me, my mind makes a note of it and then subconsciously works toward finding a use-case for my newfound knowledge.

An example is that I recently learned how protected member data (C++) is actually not safe from outside pryers (even in clean code that does not use typecasts).

Read the rest of the post »


Posted by Allan Odgaard

{{ numberOfComments('2010-04-15-gcc-4-5-c0x') }}

GCC 4.5 & C++0x

GCC 4.5.0 is out and their progress on implementing C++0x features is coming along nicely.

If you are on OS X and want to try it out you can install it via MacPorts:

sudo port install gcc45

The binary installed is named g++-mp-4.5 and you must use the -std=c++0x argument to enable the new features.

Of the supported C++0x features here are some of those that I find the most interesting (for my use of C++).

Read the rest of the post »


Posted by Allan Odgaard

{{ numberOfComments('2010-04-01-parallel-bzip2') }}

Parallel BZip2

I ran some benchmarks which included PBZip2, a multi-threaded implementation of BZip2 (which is slow yet effective, so my preferred choice of compressor for basically everything).

Running the Burrows–Wheeler transform over the input blocks is a task well suited for being parallelized and the benchmarks show that Jeff Gilchrist did a great job at this:

Compressor Time Archive Size
None (cat) 2.3s 50 MB
GZip 4.0s 34 MB
BZip2 16.3s 29 MB
PBZip2 3.0s 29 MB
LZip 41.8s 24 MB

The timings were produced by running the code below 4 times and taking the average of the last 3 runs (for each compressor).

This was executed on a 2 × 2.8 GHz Quad Core Mac Pro where PBZip2 (correctly) auto-detected 8 cores.

I am running PBZip2 version 1.1.0 from MacPorts (sudo port install pbzip2).

for Z in cat gzip bzip2 pbzip2 lzip; do
   time tar -cf "${Z}.res" --use-compress-prog="${Z}" Avian
done

Update: Added test with LZip (an LZMA based compresser). There is a multi-threaded implementation of this (plzip) but a quick ./configure && make did not cut it.


Posted by Allan Odgaard

{{ numberOfComments('2010-03-28-search-path-for-cd') }}

Search Path for CD

I just learned this neat thing about the cd shell command:

The variable CDPATH defines the search path for the directory containing «dir». Alternative directory names in CDPATH are separated by a colon (:). A null directory name is the same as the current directory. If «dir» begins with a slash (/), then CDPATH is not used.

For example:

% export CDPATH=$HOME/Source:$HOME/Library/Application\ Support/TextMate
% cd Avian/
/Users/duff/Source/Avian
% cd Bundles/
/Users/duff/Library/Application Support/TextMate/Bundles
% cd Support/lib/
/Users/duff/Library/Application Support/TextMate/Support/lib
% cd Avian/Frameworks/
/Users/duff/Source/Avian/Frameworks

This works with tab completion (using bash 4.1.2) so regardless of the current directory, I can generally do cd Av⇥↩ to reach ~/Source/Avian.


Posted by Allan Odgaard

{{ numberOfComments('2010-01-23-build-automation-part-2') }}

Build Automation Part 2

This is part 2 of what I think will end up as four parts. This might be a bit of a rehash of the first part, but I skimmed lightly over why it actually is that I am so fond of make compared to most other build systems, so I will elaborate with some examples.

Part 3 will be a general post about declarative systems, not directly related to build automation. Part 4 should be about auto-generating the make files (which is part of the motivation for writing about declarative systems first).

Read the rest of the post »


Posted by Allan Odgaard

{{ numberOfComments('2010-01-15-build-automation-part-1') }}

Build Automation Part 1

A blog post about Ant vs. Maven concludes that “the best build tool is the one you write yourself” and the Programmer Competency Matrix has “can setup a script to build the system” as requirement for reaching the higher levels in the “build automation” row.

I have looked at a lot of build systems myself, and while I agree that the best build system is the one you create yourself I am also a big fan of make and believe that the best approach is to use generated Makefiles.

This post is a “getting started with make”. I plan to follow up with a part 2 about how to handle auto-generated self-updating Makefiles.

Read the rest of the post »


Posted by Allan Odgaard

{{ numberOfComments('2009-08-22-self-balancing-trees') }}

Self-balancing Trees

In a previous blog post I describe a data structure which require the use of a self-balancing binary search tree.

Read the rest of the post »


Posted by Allan Odgaard

{{ numberOfComments('2009-08-18-cuckoo-hashing') }}

Cuckoo Hashing

The Achilles’ heel of hashing is collision: When we want to insert a new value into the hash table and the slot is already filled, we use a fallback strategy to find another slot, for example linear probing.

The fallback strategy can affect lookup time since we need to do the same probing when a lookup results in an entry with wrong key, turning the nice O(1) time complexity into (worst case) O(n).

Read the rest of the post »


Posted by Allan Odgaard

{{ numberOfComments('2009-08-13-maintaining-a-layout') }}

Maintaining a Layout

TextMate works with fixed-width fonts both because of the simplicity and because it is the immediate difference between a plain text editor and a word processor.

Though for version 2.0 I want it to do a richer layout, e.g. larger headings in markup languages, indented soft wrap, proper support for unicode, etc. So I had to bite the bullet and figure out how to allow this with reasonable performance, this article explains the problem and data structure I picked.

Read the rest of the post »


Blog Spam Filtering Ideas

I have previously detailed how I fight comment spam using a JavaScript challenge.

I host two blogs, a wiki, and a ticket system, all targets for spam, so I have since generalized the system by using mod_rewrite to redirect all POSTs without a cookie to a page which uses JavaScript to set this cookie and resubmit the request (which is then no longer catched by mod_rewrite due to the cookie being set). This means “blocking” spam doesn’t require a plug-in written specifically for the particular web application.

Despite this JS challenge some spam still gets through, and that’s what this post is about.

Read the rest of the post »


Get OS Version From Scripts

It is sometimes useful to have a script check the OS version, for example the way to get the user’s full name was previously done using niutil but Apple removed that command in Leopard (it can now be done using dscl).

Read the rest of the post »


Optimizing Path Normalization

One of my path functions is normalize. It removes (redundant) slashes and references to directory meta entries (current and parent directory).

A lot of other path functions use or rely on normalize, for example my version of dirname() is simply: return normalize(path + "/..");.

I was recently tasked with rewriting normalize to be more efficient and it proved to be a bit of a challenge, so I’ll share what I came up with.

Read the rest of the post »


Worker Thread Protocol

When two components are used together, let’s call them A and B, it is a good approach to figure out who is using whom, and if A is using B then B should not know about A and vice versa.

This rule of thumb lowers complexity and makes both refactoring and re-use of code easier.

One scenario where it might be appealing to ignore this rule is when outsourcing computation to a worker thread, but here it is actually more important to stick with it.

Read the rest of the post »


Simplifying Boolean Expressions

I recently had a boolean expression of the following form:

a || (x && b) || (x && y && c) || (x && y && z && d)

It looked redundant with 10 instances of only 7 different variables.

Read the rest of the post »


Posted by Allan Odgaard

{{ numberOfComments('2009-03-09-uti-problems') }}

UTI Problems

I was excited to use the “new” Universal Type Identifiers but excitement turned to confusion and a bit of disappointment. I will share my findings in this article.

Read the rest of the post »


Posted by Allan Odgaard

{{ numberOfComments('2008-09-19-automatic-storage') }}

Automatic Storage

One of the things I like about C++ is the ability to have the compiler create code for me that does actual work.

What do I mean? I am thinking about implicit conversions (wrapping) of data types and constructing/destructing data types when they go in/out of scope.

I will focus on the latter in this blog post, show how it can be used with Objective-C and how it can track leaks in C++ code.

Read the rest of the post »


Posted by Allan Odgaard

{{ numberOfComments('2008-05-22-objective-cxx-tips') }}

Objective-C++ Tips

C++ Objects as Instance Data

Say you create a custom view with arbitrary many
tracking rectangles (i.e. dynamically added).

Each time you add a rectangle you get back an
identifier for this rectangle which can’t be
stored in an NSArray as-is since it
is of the primitive type NSTrackingRectTag (an integer).

If you use Objective-C++ then you can use a
std::vector<NSTrackingRectTag> to avoid
having to box/unbox your identifiers but
if you have tried to put non-POD in the interface
declaration of your Objective-C class you have probably
seen that gcc does not like that.

Well, starting with 10.4 (so actually, some time ago)
Apple added a switch to gcc which allows
C++ objects as part of the instance data, and it will
call both constructor and destructor for your C++
objects when allocating/deallocating the Objective-C
object.

The flag you need to set is -fobjc-call-cxx-cdtors.

C++ Objects as Method Arguments

Occasionally it is convenient to pass a C++ object
to an Objective-C method. For example I have an NSString
initializer that takes a std::string as argument.

This works as long as you pass the object as a
reference (i.e. pass a pointer), but you can use
the “reference of” operator in the method signature
rather than at the call-site. By using a const
reference it will work for temporary/implicit objects.

So with the following method:

+ (NSString*)stringWithCxxString:(std::string const&)cxxString
{
   return [[[NSString alloc] initWithBytes:cxxString.data()
                                    length:cxxString.size()
                                  encoding:NSUTF8StringEncoding] autorelease];
}

We can have code like this:

std::string dir  = get_some_dir();
std::string file = get_some_file();

NSString* str    = [NSString stringWithCxxString:dir + file];

Message Catalogs on Darwin

I wanted to localize a shell command to give danish output and decided to look into the message catalog functions described in/by XPG4.

Read the rest of the post »


Clipboard Access From Shell (UTF-8)

Update 2011-01-27: Recent versions of Mac OS X make these replacements obsolete.

Two very nice shell commands that Apple has given us are the pbcopy and pbpaste commands. These allow stdin to go to the clipboard and the clipboard to be written to stdout.

Unfortunately the commands seem to use a combination of MacRoman and question marks for non-ASCII characters, which often makes them unusable for me, since I work with non-ASCII characters.

So today I decided to write a replacement for the two commands (yes, I did also file an enhancement report). You can download them here.

There’s just one source, it compiles to a command which works as pbcopy, when called under that name, otherwise pbpaste.

What I’ve done is place the command in ~/bin and added a symbolic link from pbpaste to pbcopy, like this:

  ln -s pbcopy ~/bin/pbpaste

And in addition ensured that my PATH contains ~/bin before anything else, i.e. by placing the following in my ~/.bash_profile (well, actually ~/.zshrc):

  export PATH="$HOME/bin:/opt/local/bin:$PATH:/Developer/Tools"

The source is included in the archive, and it’s very simple. No usage instructions etc., and it links with the Application Kit, since NSPasteboard is under that and not Foundation Kit.


Progress Indicator for Unarchiving

I added a software updater to my application, and one of the steps was uncompressing the archive (after downloading it). Since the archive size is a few megabytes, and I use bzip2 as compression, this step takes a few seconds, and thus I want to show a determinate progress indicator while it is working on this.

Read the rest of the post »


Posted by Allan Odgaard

{{ numberOfComments('2005-09-25-fighting-comment-spam') }}

Fighting Comment Spam

Update 2007-07-17: Since I installed the JS challenge almost two years ago it has blocked 83,837 POSTs. Roughly a dozen spam POSTs did defeat the challenge. Looking at the access log for these they do seem to be from actual humans (based on the initial hit having a google referrer, all resources (CSS and images) being fetched, and the delay from last GET to the POST), but it could also be a cleverly scripted browser (not sure of the “economy” of either though).

Recently I’ve received a lot of comment spam, which is fake comments posted to a blog or wiki (for me once every hour) with the purpose of increasing the page rank for a website.

Looking at the comment spam I have received, I see that more than 90% of the IP addresses are unique (infected Windows machines used as proxies?) so for the challenge I decided to run sha-1 on the visitors IP (plus a constant) and ask for that back when he submits the form.

Read the rest of the post »