SIGPIPE 13

Programming, automation, algorithms, macOS, and more.

Oniguruma C++ Wrapper

I’ve recently partially switched to the Oniguruma regular expression library.

Since I also use regular expressions in my source code I’ve created a simple C++ wrapper which makes the API more friendly to my tasks. I generally work with iterators, and there are 4 tasks I often do.

  1. Create a pattern object, this is done using:

      ptrn_t ptrn(first, last);
    

    Here first, last is the iterator sequence that contains the regular expression.

  2. Test if a sequence matches a pattern, done using:

      if(find(first, it, last, ptrn))
          ...;
    

    Here it, last is the sequence which is matched, first is the start of the buffer, in case the pattern uses look-behind or starts with a word/line boundary or similar.

  3. Move an iterator to the first occurance of a pattern (or end-of-sequence if no match):

      it = find(first, it, last, ptrn);
    
  4. Examine the captures of a match, if there was one:

      if(match_t const& m = find(first, it, last, ptrn))
      {
         for(int i = 1; i < m.size(); i++)
         {
            if(!m.empty(i))
               cout << string(m.begin(i), m.end(i)) << endl;
         }
      }
    

The wrapper (less than 100 lines) with an example can be downloaded from here. The nice thing about the above API is that a) you don’t have to alloc/release resources yourself (and it does reference count on the match_t object in case you make copies) and b) all the cases make use of the same STL-inspired find()-function, so there’s little to remember (the match_t class is also inspired by STL with the begin/end and size member functions).

The supplied wrapper uses char sequences and expect them to be UTF-8 encoded. Unfortunately this library can’t work with real STL iterators.

Btw: by adding a char* constructor to ptrn_t it’s possible to write e.g.:

  it = find(first, it, last, "(foo|bar)");

Which advances it to the first occurrence of either “foo” or “bar” in the it, last sequence. Almost like using a high-level language.

Loading comments…