Solarian Programmer

My programming ramblings

C++11 regex tutorial part 2

Posted on October 20, 2011 by Sol

The code for this tutorial is on GitHub: https://github.com/sol-prog/regex_tutorial.

In the first part of this tutorial we have used regex_match to verify the user input. The regex_match algorithm in C++11 will return true only if the target string match exactly the regular expression, e.g. [[:digit:]]+ will match exactly "123", "456" and not "123e-05" or "456.22".

If you need a partial match of a given string you could use regex_search which will return true if it finds a partial match in the target string (a match on a substring of the original string). One can also retrieve the result of regex_search in an smatch object.

A practical application of regex_search could be in removing the leading empty spaces from all lines of an input file. Suppose that you have a badly formatted text file which has a lot of empty spaces on each line:

 1 The solution was obviously to compile the last versions of llvm,
 2 
 3 
 4       clang and libc++ from sources. Because I use my computer to hack
 5 
 6    some Objective-C code I didn't want to mess my system installing
 7 
 8              new versions of llvm and clang so I've tried to install these on
 9 			a test Linux box. After a few hours of struggle I was able to
10 
11   compile C++11 code that uses regular expressions and raw strings
12          literals on Linux! This was interesting enough to deserve a
13 
14 
15       separate post on my blog and hopefully will help others to get
16              started with clang++ and libc++ on their Linux boxes.

We could use the power of regular expressions to clean this mess from all extra leading spaces and empty lines. Suppose that the above text was placed in a file named "spaces.txt". We could implement a small code that will clean this file:

 1 #include <iostream>
 2 #include <string>
 3 #include <fstream>
 4 #include <regex>
 5 
 6 using namespace std;
 7 
 8 int main()
 9 {
10     ifstream inp;
11     ofstream out;
12     regex leading_spaces("[[:space:]]*(.+)");
13     string line;
14     inp.open("spaces.txt");
15     out.open("clear_spaces.txt");
16     if(inp.is_open())
17     {
18         while(inp.good())
19         {
20             getline(inp,line);
21             smatch result;
22             regex_search(line,result,leading_spaces);  //search for leading spaces
23             if(result[1].str().length()>0)             //ignore empty lines
24             {
25                 out<<result[1].str()<<endl;            //write on disk the cleaned line
26             }
27         }
28     }
29     inp.close();
30     out.close();
31     return(0);
32 }

The above code can be improved by removing the hard coded file names and using argc, argv to retrieve the file names from command line. Also, you should include some error checking code, what will happen for example if the input file is not present ? My purpose here is to exemplify in a simple example the use of regex_search, feel free to improve the code from github.

You could compile the above code with:

1 clang++ -std=c++11 -stdlib=libc++ regex_04.cpp -o regex_04.out

regex_04.cpp can be also compiled with Visual Studio 2010, unfortunately at the time of this writing gcc-4.6.1 doesn't have complete implementation of regex functionality.

If you run regex_04, the result is a file with no leading spaces per line and without empty lines.

1 The solution was obviously to compile the last versions of llvm,
2 clang and libc++ from sources. Because I use my computer to hack
3 some Objective-C code I didn't want to mess my system installing
4 new versions of llvm and clang so I've tried to install these on
5 a test Linux box. After a few hours of struggle I was able to
6 compile C++11 code that uses regular expressions and raw strings
7 literals on Linux! This was interesting enough to deserve a
8 separate post on my blog and hopefully will help others to get
9 started with clang++ and libc++ on their Linux boxes.

Another useful algorithm from C++11 regex is regex_replace, this can be used to replace all occurrences of a given pattern with a formatting string. Suppose we have a line of text made of words and numbers in no particular order. We could use regex_replace to extract the numbers or the words from this line:

 1 #include <iostream>
 2 #include <string>
 3 #include <regex>
 4 
 5 using namespace std;
 6 
 7 int main()
 8 {
 9     //This should match any real number
10     regex number("((\\+|-)?[[:digit:]]+)(\\.(([[:digit:]]+)?))?((e|E)((\\+|-)?)[[:digit:]]+)?");
11     //This should match any word
12     regex word("[[:alpha:]]+");
13     string input,clean_words,clean_numbers;
14     //Replace with an empty string
15     const string format="";
16     getline(cin,input);
17     //Split the input string in numbers and words
18     clean_numbers=regex_replace(input,number,format,regex_constants::format_default);
19     clean_words=regex_replace(input,word,format,regex_constants::format_default);
20     //Print the results
21     cout<<clean_words<<endl;
22     cout<<clean_numbers<<endl;
23     return(0);
24 }

Let's see the above code in action:

Regex code

Similar expressions can be constructed for testing any kind of user input. If you want to learn more about regular expressions, the most authoritative source in the filed is the book Mastering Regular Expressions by Jeffrey E.F. Friedl:

If you are interested in learning more about the new C++11 syntax I would recommend reading Professional C++ by M. Gregoire, N. A. Solter, S. J. Kleper 2nd edition:

or, if you are a C++ beginner you could read C++ Primer (5th Edition) by S. B. Lippman, J. Lajoie, B. E. Moo.


Show Comments