Oniguruma C++ Wrapper
I’ve recently partially switched to the Oniguruma regular expression library.
Since I also use regular expressions in my source code I’ve created a simple C++ wrapper which makes the API more friendly to my tasks. I generally work with iterators, and there are 4 tasks I often do.
-
Create a pattern object, this is done using:
ptrn_t ptrn(first, last);
Here
first
,last
is the iterator sequence that contains the regular expression. -
Test if a sequence matches a pattern, done using:
if(find(first, it, last, ptrn)) ...;
Here
it
,last
is the sequence which is matched,first
is the start of the buffer, in case the pattern uses look-behind or starts with a word/line boundary or similar. -
Move an iterator to the first occurance of a pattern (or end-of-sequence if no match):
it = find(first, it, last, ptrn);
-
Examine the captures of a match, if there was one:
if(match_t const& m = find(first, it, last, ptrn)) { for(int i = 1; i < m.size(); i++) { if(!m.empty(i)) cout << string(m.begin(i), m.end(i)) << endl; } }
The wrapper (less than 100 lines) with an example can be downloaded from here. The nice thing about the above API is that a) you don’t have to alloc/release resources yourself (and it does reference count on the match_t
object in case you make copies) and b) all the cases make use of the same STL-inspired find()
-function, so there’s little to remember (the match_t
class is also inspired by STL with the begin/end and size member functions).
The supplied wrapper uses char sequences and expect them to be UTF-8 encoded. Unfortunately this library can’t work with real STL iterators.
Btw: by adding a char*
constructor to ptrn_t
it’s possible to write e.g.:
it = find(first, it, last, "(foo|bar)");
Which advances it
to the first occurrence of either “foo” or “bar” in the it
, last
sequence. Almost like using a high-level language.