C++11 multithreading tutorial - part 2
Posted on February 27, 2012 by Sol

The code for this tutorial is on GitHub: https://github.com/sol-prog/threads.

In my last tutorial about using threads in C++11 we’ve seen that the new C++11 threads syntax is remarkably clean compared with the POSIX pthreads syntax. Using a few simple concepts we were able to build a fairly complex image processing example avoiding the subject of thread synchronization. In the second part of this introduction to multithreading programming in C++11 we are going to see how we can synchronize a group of threads running in parallel.

We’ll start with a quick remainder of how we can create a group of threads in C++11. In the last tutorial we’ve seen that we can store a group of threads in a classical C-type array, it is entirely possible to store our threads in a std::vector which is more in the spirit of C++11 and avoids the pitfalls of dynamical memory allocation with new and delete:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 #include #include #include //This function will be called from a thread void func(int tid) { std::cout << "Launched by thread " << tid << std::endl; } int main() { std::vector th; int nr_threads = 10; //Launch a group of threads for (int i = 0; i < nr_threads; ++i) { th.push_back(std::thread(func,i)); } //Join the threads with the main thread for(auto &t : th){ t.join(); } return 0; } 

Compiling the above program on Mac OSX Lion with clang++ or with gcc-4.7 (gcc-4.7 was compiled from source):

 1 2 3 clang++ -Wall -std=c++0x -stdlib=libc++ file_name.cpp g++-4.7 -Wall -std=c++11 file_name.cpp 

On a modern Linux system with gcc-4.6.x we can compile the code with:

 1 g++ -std=c++0x -pthread file_name.cpp 

Some real life problems are embarrassingly parallel in their nature and can be well managed with the simple syntax presented in the first part of this tutorial. Adding two arrays, multiplying an array with a scalar, generating the Mandelbroot set are classical examples of embarrassingly parallel problems.

Other problems by their nature require some level of synchronization between threads. Take for example the dot product of two vectors: take two vectors of equal lengths multiply them element by element and add the result of each multiplication in a scalar variable. A naive parallelization of this problem is presented in the next code snippet:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 #include #include #include ... void dot_product(const std::vector &v1, const std::vector &v2, int &result, int L, int R){ for(int i = L; i < R; ++i){ result += v1[i] * v2[i]; } } int main(){ int nr_elements = 100000; int nr_threads = 2; int result = 0; std::vector threads; //Fill two vectors with some constant values for a quick verification // v1={1,1,1,1,...,1} // v2={2,2,2,2,...,2} // The result of the dot_product should be 200000 for this particular case std::vector v1(nr_elements,1), v2(nr_elements,2); //Split nr_elements into nr_threads parts std::vector limits = bounds(nr_threads, nr_elements); //Launch nr_threads threads: for (int i = 0; i < nr_threads; ++i) { threads.push_back(std::thread(dot_product, std::ref(v1), std::ref(v2), std::ref(result), limits[i], limits[i+1])); } //Join the threads with the main thread for(auto &t : threads){ t.join(); } //Print the result std::cout<

The result of the above code should obviously be 200000, however, running the code a few times gives slightly different results:

  1 2 3 4 5 6 7 8 9 10 sol $g++-4.7 -Wall -std=c++11 cpp11_threads_01.cpp sol$./a.out 138832 sol $./a.out 138598 sol$./a.out 138032 sol $./a.out 140690 sol$ 

What has happened ??? Look carefully at line 9 of the C++ code, you can see that the variable result sums the result of v1[i] and v2[i]. Line 9 is a typical example of a race condition, this code runs in two parallel asynchronous threads and the variable result is changed by whichever thread access it first.

We can avoid this problem by specifying that this variable should be accessed synchronously by our threads, we can use for this a mutex which is a special purpose variable that acts like a barrier, synchronizing the access to the code that modifies the result variable:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 #include #include #include #include static std::mutex barrier; ... void dot_product(const std::vector &v1, const std::vector &v2, int &result, int L, int R){ int partial_sum = 0; for(int i = L; i < R; ++i){ partial_sum += v1[i] * v2[i]; } std::lock_guard block_threads_until_finish_this_job(barrier); result += partial_sum; } ... 

Line 6 creates a global mutex variable barrier, line 15 forces the threads to finalize the for loop and access synchronously result. Notice that this time we use a new variable partial_sum declared locally for each thread. The rest of the code is unchanged.

For this particular case we can actually find a simpler and more elegant solution, we can use an atomic type which is a special kind of variable that allows safe concurrent reading/writing, basically the synchronization is done under the hood. As a side note on an atomic type we can apply only atomic operations which are defined in the atomic header:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 #include #include #include #include void dot_product(const std::vector &v1, const std::vector &v2, std::atomic &result, int L, int R){ int partial_sum = 0; for(int i = L; i < R; ++i){ partial_sum += v1[i] * v2[i]; } result += partial_sum; } int main(){ int nr_elements = 100000; int nr_threads = 2; std::atomic result(0); std::vector threads; ... return 0; } `

The atomic types and atomic operations are not available in the current Apple’s clang++, however you can use atomic types if you are wiling to compile the last clang++ from sources, or you can use the last gcc-4.7 also compiled from sources.

If you are interested in learning more about the new C++11 syntax I would recommend reading Professional C++ by M. Gregoire, N. A. Solter, S. J. Kleper 2nd edition:

or, if you are a C++ beginner you could read C++ Primer (5th Edition) by S. B. Lippman, J. Lajoie, B. E. Moo.