In this blog post we will construe some simple examples of regular expressions in C, also known as a
regex. We will use the popular libraries PCRE and PCRE2. If you don’t know what a regex is or have never used them, then you can close this tab right now! Learning regular expressions in in C is probably the wrong way to go. Start with something more easy like Python, Perl, or anything else because doing them in C is difficult. If you know regular expressions from other languages, learning them in C will strengthen your understanding of the concept. This example is meant to be easy to understand, simple and useful.
First, Some Background
- Regexes are not part of ANSI C, a library needs to be used.
- There are 2 main libraries: POSIX or PCRE
- Be aware of the text you’re doing regex’s on. If you’re doing UTF we don’t cover that here. We’ll delve into “8-bit” Code Points such as ASCII.
POSIX Regular Expressions
If you see the following included in the C source then it’s POSIX Regular Expressions. POSIX Regular expressions have lost the popularity battle and you won’t see them used much.
We won’t discuss POSIX regular expressions in this blog post from here on.
Perl Compabitible Regular Expressions
PCRE and PCRE2
The PCRE Library has 2 versions:
pcre2. The older
pcre was released over 20 years ago in 1997 and is at version 8.43 as of this post. Future releases will be for bugfixes only. New features will be released in
pcre2 which was released in 2015 and is now at version 10.34 as of this writing. In this blog post we have an example for both
You can obviously install the
pcre library from source. However, let’s go the easy route and install through a package manager:
Yum - CentOS
Apt - Ubuntu
Pacman - ArchLinux
A Useful, Simple Example
Before we get into the code let’s create a good example with capture groups. A regex that matches a first and last name has some good value. Here we can have one group the first name and one group the last name.
Example Regex for a Person’s First and Last Name
Here we have two “capture groups”, the group is what is between the
(). By having groups we can capture and use different parts of what is matched in the regex. Inside each
() we have a representation of a very simple name. It has
 which is a character class that matches capital A through Z, then we have another capture class [a-z]+ which matches a through z one or more times. To be explicit A through Z is A,B,C,D,E … all the way to Z. Again, this regex is very simple and I’m sorry if I offended you by it not matching your name. For example Jon McCarthy would not match since in the last name we have 2 capital letters. The
$ match the beginning and end of a line respectively.
Trying our Example
The PCRE library comes with a helper tool. Called
pcre2test respectively. I will use
pcre2test as all of this is backward compatible since we’re not doing anything advanced with regexes.
Now, we can see for the subject Lloyd Rochester we got 3 matches? Huh, why not 2? The reason is if the regex matched at all we’ll get 1 match, and the other 2 are for the groups. Sorry but John McCarthy didn’t match.
How we’ll use our Simple Example
Let’s make 2 examples that are used like so:
We will create 2 programs called
pcre_ex2 which use their corresponding libraries. We will pass in 2 arguments to these libraries. The first argument will be our regular expression and the second argument will be our subject. The “subject” is the thing we will match against.
Let’s dive into a PCRE example in the legacy library
Simple PCRE Sample Program
Below is a simple example using
pcre. It is linked with
Simple PCRE2 Sample Program
Below is a simple example using
pcre2. It is linked with
In this example there is a loop that goes through the subject using the
ovector. I wanted to use the helper functions
pcre2_substring_get_bynumber, however, I could not get them to work. For
pcre2_substring_copy_bynumber it would match both
Lloyd Rochester and
Lloyd but then would give me a
PCRE2_ERROR_NOMEMORY on the third. For
pcre2_substring_get_bynumber I mainly got segmentations faults. I’m still not sure why these helper functions couldn’t be used.