Beating Binary Search

June 17th, 2010

Jay from LinkedIn’s SNA team writes:

Quick, what is the fastest way to search a sorted array?

Binary search, right?

Wrong. There is actually a method called interpolation search

Accessing Protected Data

May 6th, 2010

Whenever I see something that intrigues me, my mind makes a note of it and then subconsciously works toward finding a use-case for my newfound knowledge.

An example is that I recently learned how protected member data (C++) is actually not safe from outside pryers (even in clean code that does not use typecasts).

Read the rest of this entry »

GCC 4.5 & C++0x

April 15th, 2010

GCC 4.5.0 is out and their progress on implementing C++0x features is coming along nicely.

If you are on OS X and want to try it out you can install it via MacPorts:

sudo port install gcc45

The binary installed is named g++-mp-4.5 and you must use the -std=c++0x argument to enable the new features.

Of the supported C++0x features here are some of those that I find the most interesting (for my use of C++).

Read the rest of this entry »

Parallel BZip2

April 1st, 2010

I ran some benchmarks which included PBZip2, a multi-threaded implementation of BZip2 (which is slow yet effective, so my preferred choice of compressor for basically everything).

Running the Burrows–Wheeler transform over the input blocks is a task well suited for being parallelized and the benchmarks show that Jeff Gilchrist did a great job at this:

CompressorTime Archive Size
None (cat) 2.3s 50 MB
GZip 4.0s 34 MB
BZip2 16.3s 29 MB
PBZip2 3.0s 29 MB
LZip 41.8s 24 MB

The timings were produced by running the code below 4 times and taking the average of the last 3 runs (for each compressor).

This was executed on a 2 × 2.8 GHz Quad Core Mac Pro where PBZip2 (correctly) auto-detected 8 cores.

I am running PBZip2 version 1.1.0 from MacPorts (sudo port install pbzip2).

for Z in cat gzip bzip2 pbzip2 lzip; do
   time tar -cf "${Z}.res" --use-compress-prog="${Z}" Avian
done

Update: Added test with LZip (an LZMA based compresser). There is a multi-threaded implementation of this (plzip) but a quick ./configure && make did not cut it.

Search Path for CD

March 28th, 2010

I just learned this neat thing about the cd shell command:

The variable CDPATH defines the search path for the directory containing «dir». Alternative directory names in CDPATH are separated by a colon (:). A null directory name is the same as the current directory. If «dir» begins with a slash (/), then CDPATH is not used.

For example:

% export CDPATH=$HOME/Source:$HOME/Library/Application\ Support/TextMate
% cd Avian/
/Users/duff/Source/Avian
% cd Bundles/
/Users/duff/Library/Application Support/TextMate/Bundles
% cd Support/lib/
/Users/duff/Library/Application Support/TextMate/Support/lib
% cd Avian/Frameworks/
/Users/duff/Source/Avian/Frameworks

This works with tab completion (using bash 4.1.2) so regardless of the current directory, I can generally do cd Av⇥↩ to reach ~/Source/Avian.

Build Automation Part 2

January 23rd, 2010

This is part 2 of what I think will end up as four parts. This might be a bit of a rehash of the first part, but I skimmed lightly over why it actually is that I am so fond of make compared to most other build systems, so I will elaborate with some examples.

Part 3 will be a general post about declarative systems, not directly related to build automation. Part 4 should be about auto-generating the make files (which is part of the motivation for writing about declarative systems first).

Read the rest of this entry »

Build Automation Part 1

January 15th, 2010

A blog post about Ant vs. Maven concludes that “the best build tool is the one you write yourself” and the Programmer Competency Matrix has “can setup a script to build the system” as requirement for reaching the higher levels in the “build automation” row.

I have looked at a lot of build systems myself, and while I agree that the best build system is the one you create yourself I am also a big fan of make and believe that the best approach is to use generated Makefiles.

This post is a “getting started with make”. I plan to follow up with a part 2 about how to handle auto-generated self-updating Makefiles.

Read the rest of this entry »

Self-balancing Trees

August 22nd, 2009

In a previous blog post I describe a data structure which require the use of a self-balancing binary search tree.

Read the rest of this entry »

Cuckoo Hashing

August 18th, 2009

The Achilles’ heel of hashing is collision: When we want to insert a new value into the hash table and the slot is already filled, we use a fallback strategy to find another slot, for example linear probing.

The fallback strategy can affect lookup time since we need to do the same probing when a lookup results in an entry with wrong key, turning the nice O(1) time complexity into (worst case) O(n).

Read the rest of this entry »

Maintaining a Layout

August 13th, 2009

TextMate works with fixed-width fonts both because of the simplicity and because it is the immediate difference between a plain text editor and a word processor.

Though for version 2.0 I want it to do a richer layout, e.g. larger headings in markup languages, indented soft wrap, proper support for unicode, etc. So I had to bite the bullet and figure out how to allow this with reasonable performance, this article explains the problem and data structure I picked.

Read the rest of this entry »

Blog Spam Filtering Ideas

August 11th, 2009

I have previously detailed how I fight comment spam using a JavaScript challenge.

I host two blogs, a wiki, and a ticket system, all targets for spam, so I have since generalized the system by using mod_rewrite to redirect all POSTs without a cookie to a page which uses JavaScript to set this cookie and resubmit the request (which is then no longer catched by mod_rewrite due to the cookie being set). This means “blocking” spam doesn’t require a plug-in written specifically for the particular web application.

Despite this JS challenge some spam still gets through, and that’s what this post is about.

Read the rest of this entry »

Get OS Version From Scripts

August 1st, 2009

It is sometimes useful to have a script check the OS version, for example the way to get the user’s full name was previously done using niutil but Apple removed that command in Leopard (it can now be done using dscl).

Read the rest of this entry »

Optimizing Path Normalization

August 1st, 2009

One of my path functions is normalize. It removes (redundant) slashes and references to directory meta entries (current and parent directory).

A lot of other path functions use or rely on normalize, for example my version of dirname() is simply: return normalize(path + "/..");.

I was recently tasked with rewriting normalize to be more efficient and it proved to be a bit of a challenge, so I’ll share what I came up with.

Read the rest of this entry »

Worker Thread Protocol

July 30th, 2009

When two components are used together, let’s call them A and B, it is a good approach to figure out who is using whom, and if A is using B then B should not know about A and vice versa.

This rule of thumb lowers complexity and makes both refactoring and re-use of code easier.

One scenario where it might be appealing to ignore this rule is when outsourcing computation to a worker thread, but here it is actually more important to stick with it.

Read the rest of this entry »

Simplifying Boolean Expressions

July 27th, 2009

I recently had a boolean expression of the following form:

a || (x && b) || (x && y && c) || (x && y && z && d)

It looked redundant with 10 instances of only 7 different variables.

Read the rest of this entry »

UTI Problems

March 9th, 2009

I was excited to use the “new” Universal Type Identifiers but excitement turned to confusion and a bit of disappointment. I will share my findings in this article.

Read the rest of this entry »

Automatic Storage

September 19th, 2008

One of the things I like about C++ is the ability to have the compiler create code for me that does actual work.

What do I mean? I am thinking about implicit conversions (wrapping) of data types and constructing/destructing data types when they go in/out of scope.

I will focus on the latter in this blog post, show how it can be used with Objective-C and how it can track leaks in C++ code.

Read the rest of this entry »

Objective-C++ Tips

May 22nd, 2008

C++ Objects as Instance Data

Say you create a custom view with arbitrary many tracking rectangles (i.e. dynamically added).

Each time you add a rectangle you get back an identifier for this rectangle which can’t be stored in an NSArray as-is since it is of the primitive type NSTrackingRectTag (an integer).

If you use Objective-C++ then you can use a std::vector<NSTrackingRectTag> to avoid having to box/unbox your identifiers but if you have tried to put non-POD in the interface declaration of your Objective-C class you have probably seen that gcc does not like that.

Well, starting with 10.4 (so actually, some time ago) Apple added a switch to gcc which allows C++ objects as part of the instance data, and it will call both constructor and destructor for your C++ objects when allocating/deallocating the Objective-C object.

The flag you need to set is -fobjc-call-cxx-cdtors.

C++ Objects as Method Arguments

Occasionally it is convenient to pass a C++ object to an Objective-C method. For example I have an NSString initializer that takes a std::string as argument.

This works as long as you pass the object as a reference (i.e. pass a pointer), but you can use the “reference of” operator in the method signature rather than at the call-site. By using a const reference it will work for temporary/implicit objects.

So with the following method:

+ (NSString*)stringWithCxxString:(std::string const&)cxxString
{
   return [[[NSString alloc] initWithBytes:cxxString.data()
                                    length:cxxString.size()
                                  encoding:NSUTF8StringEncoding] autorelease];
}

We can have code like this:

std::string dir  = get_some_dir();
std::string file = get_some_file();

NSString* str    = [NSString stringWithCxxString:dir + file];

Message Catalogs on Darwin

April 20th, 2006

I wanted to localize a shell command to give danish output and decided to look into the message catalog functions described in/by XPG4.

Read the rest of this entry »

Clipboard access from shell (utf-8)

October 11th, 2005

Two very nice shell commands that Apple has given us are the pbcopy and pbpaste commands. These allow stdin to go to the clipboard and the clipboard to be written to stdout.

Unfortunately the commands seem to use a combination of MacRoman and question marks for non-ASCII characters, which often makes them unusable for me, since I work with non-ASCII characters.

So today I decided to write a replacement for the two commands (yes, I did also file an enhancement report). You can download them here.

There's just one source, it compiles to a command which works as pbcopy, when called under that name, otherwise pbpaste.

What I've done is place the command in ~/bin and added a symbolic link from pbpaste to pbcopy, like this:

  ln -s pbcopy ~/bin/pbpaste

And in addition ensured that my PATH contains ~/bin before anything else, i.e. by placing the following in my ~/.bash_profile (well, actually ~/.zshrc):

  export PATH="$HOME/bin:/opt/local/bin:$PATH:/Developer/Tools"

The source is included in the archive, and it's very simple. No usage instructions etc., and it links with the Application Kit, since NSPasteboard is under that and not Foundation Kit.