Message Catalogs on Darwin

April 20th, 2006

I wanted to localize a shell command to give danish output and decided to look into the message catalog functions described in/by XPG4.

Opening and Using a Catalog

The basics are simple, here is a short example:

#include <cstdio>
#include <nl_types.h>

int main (int argc, char const* argv[])
   if(nl_catd cat = catopen("", 0))
      char* str = catgets(cat, 0, 0, "This is a test!");

First we open the message catalog named, then we ask for string zero in set zero of that catalog (each catalog can contain multiple sets, and each set, multiple strings/messages), finally we print the string and close the catalog.

Generating the Catalog File

The is the catalog and needs to be generated using the gencat command. The source is plain text, and for our test, it looks like this:

$set 0
0 Dette er en test!

First we indicate that we are defining strings for set zero, and then we provide a string with index zero. This format also supports comments, but let’s ignore those for now.

To generate our file from the above (which we save as test.txt) we would run this line:

gencat test.txt

Finding What Catalog to Load

Okay, that was all very simple, now comes the tricky part. Which catalog to actually open?

The catopen function can take an absolute path, but that’s no good, as the idea is that the user can change the language without recompiling our program. Instead, if we specify just the name, the system will use the value of the NLSPATH environment variable to figure out where the file is located.

This variable can contain multiple locations (separated with a colon) and it can contain placeholders, such as a placeholder for the current language.

So what is the default value of NLSPATH on Mac OS X? Well, it is unset, and the catopen function has a default value (for when it is unset) which is useless on Mac, since the locations it points to do not exist with a default install of Tiger.

It seems Mac OS X keeps the catalogs under /usr/share/locale/…/LC_MESSAGES. Here the three dots refer to the actual language, e.g. en_US, da_DK, etc. In the NLSPATH we can use %L as placeholder for the current language, so it would seem that this line would be required in our shell startup (e.g. .profile):

export NLSPATH=/usr/share/locale/%L/LC_MESSAGES/%N

The last %N is a placeholder for the actual message catalog name, e.g. in our code above.

So now that we have established this path, we would need to copy our to /usr/share/locale/da_DK/LC_MESSAGES.

Changing Language

If we run our command it still gives the english message because we have not changed the language yet. So how to do that? Ideally I think one should be able to set the LC_MESSAGES environment variable to da_DK, but on Darwin the only variable used, when resolving NLSPATH is LANG, so this is the variable we need to set to da_DK.

And that’s it! So here are all the steps:

# first compile our test command
gcc -o test

# generate the message catalog
gencat test.txt

# install it (requires sudo)
sudo cp /usr/share/locale/da_DK/LC_MESSAGES

# now export the NLSPATH variable
export NLSPATH=/usr/share/locale/%L/LC_MESSAGES/%N

After this, we can run the command with either danish or default (english) output:

% LANG=da_DK ./test
Dette er en test
% ./test
This is a test!

Closing Notes

I refer above to da_DK as the language. It really is the language, then an underscore, and then the country/region (territory). One can refer to the language (subpart) in NLSPATH using %l and to the territory using %t.

It is also possible to provide an encoding (codeset), e.g.:

export LANG=da_DK.UTF-8

And this encoding can be referred to in the NLSPATH as %c. If you look in /usr/share/locale you will see that there actually are subdirectories for all of the following:

I don’t know what the intended usage of these subdirectories are. Maybe the idea is to set NLSPATH to something like this (I wrapped the line for display purposes):

export NLSPATH=

I should also add that based on the manual for catopen, it sounds like providing NL_CAT_LOCALE as flag is the right way, as it will then use the LC_MESSAGES locale category (of the current locale). I was however unsuccessful in changing the current locale away from C. So using that flag meant I never got the localized messages, no matter what value I gave LANG, LC_ALL, LC_MESSAGES, etc.

I do get the feeling that either I am missing something, or no-one ever actually made sure that this stuff works as it should for Darwin.

[by Allan Odgaard]

Leave a Reply