Any use of parallel functionality requires additional compiler
  and runtime support, in particular support for OpenMP. Adding this support is
  not difficult: just compile your application with the compiler
  flag -fopenmp. This will link
  in libgomp, the
  GNU Offloading and
    Multi Processing Runtime Library,
  whose presence is mandatory.
In addition, hardware that supports atomic operations and a compiler
  capable of producing atomic operations is mandatory: GCC defaults to no
  support for atomic operations on some common hardware
  architectures. Activating atomic operations may require explicit
  compiler flags on some targets (like sparc and x86), such
  as -march=i686,
  -march=native or -mcpu=v9. See
  the GCC manual for more information.
  To use the libstdc++ parallel mode, compile your application with
  the prerequisite flags as detailed above, and in addition
  add -D_GLIBCXX_PARALLEL. This will convert all
  use of the standard (sequential) algorithms to the appropriate parallel
  equivalents. Please note that this doesn't necessarily mean that
  everything will end up being executed in a parallel manner, but
  rather that the heuristics and settings coded into the parallel
  versions will be used to determine if all, some, or no algorithms
  will be executed using parallel variants.
Note that the _GLIBCXX_PARALLEL define may change the
  sizes and behavior of standard class templates such as
  std::search, and therefore one can only link code
  compiled with parallel mode and code compiled without parallel mode
  if no instantiation of a container is passed between the two
  translation units. Parallel mode functionality has distinct linkage,
  and cannot be confused with normal mode symbols.
When it is not feasible to recompile your entire application, or only specific algorithms need to be parallel-aware, individual parallel algorithms can be made available explicitly. These parallel algorithms are functionally equivalent to the standard drop-in algorithms used in parallel mode, but they are available in a separate namespace as GNU extensions and may be used in programs compiled with either release mode or with parallel mode.
An example of using a parallel version
of std::sort, but no other parallel algorithms, is:
#include <vector>
#include <parallel/algorithm>
int main()
{
  std::vector<int> v(100);
  // ...
  // Explicitly force a call to parallel sort.
  __gnu_parallel::sort(v.begin(), v.end());
  return 0;
}
Then compile this code with the prerequisite compiler flags
(-fopenmp and any necessary architecture-specific
flags for atomic operations.)
The following table provides the names and headers of all the parallel algorithms that can be used in a similar manner:
Table 18.1. Parallel Algorithms
| Algorithm | Header | Parallel algorithm | Parallel header | 
|---|---|---|---|
| std::accumulate | numeric | __gnu_parallel::accumulate | parallel/numeric | 
| std::adjacent_difference | numeric | __gnu_parallel::adjacent_difference | parallel/numeric | 
| std::inner_product | numeric | __gnu_parallel::inner_product | parallel/numeric | 
| std::partial_sum | numeric | __gnu_parallel::partial_sum | parallel/numeric | 
| std::adjacent_find | algorithm | __gnu_parallel::adjacent_find | parallel/algorithm | 
| std::count | algorithm | __gnu_parallel::count | parallel/algorithm | 
| std::count_if | algorithm | __gnu_parallel::count_if | parallel/algorithm | 
| std::equal | algorithm | __gnu_parallel::equal | parallel/algorithm | 
| std::find | algorithm | __gnu_parallel::find | parallel/algorithm | 
| std::find_if | algorithm | __gnu_parallel::find_if | parallel/algorithm | 
| std::find_first_of | algorithm | __gnu_parallel::find_first_of | parallel/algorithm | 
| std::for_each | algorithm | __gnu_parallel::for_each | parallel/algorithm | 
| std::generate | algorithm | __gnu_parallel::generate | parallel/algorithm | 
| std::generate_n | algorithm | __gnu_parallel::generate_n | parallel/algorithm | 
| std::lexicographical_compare | algorithm | __gnu_parallel::lexicographical_compare | parallel/algorithm | 
| std::mismatch | algorithm | __gnu_parallel::mismatch | parallel/algorithm | 
| std::search | algorithm | __gnu_parallel::search | parallel/algorithm | 
| std::search_n | algorithm | __gnu_parallel::search_n | parallel/algorithm | 
| std::transform | algorithm | __gnu_parallel::transform | parallel/algorithm | 
| std::replace | algorithm | __gnu_parallel::replace | parallel/algorithm | 
| std::replace_if | algorithm | __gnu_parallel::replace_if | parallel/algorithm | 
| std::max_element | algorithm | __gnu_parallel::max_element | parallel/algorithm | 
| std::merge | algorithm | __gnu_parallel::merge | parallel/algorithm | 
| std::min_element | algorithm | __gnu_parallel::min_element | parallel/algorithm | 
| std::nth_element | algorithm | __gnu_parallel::nth_element | parallel/algorithm | 
| std::partial_sort | algorithm | __gnu_parallel::partial_sort | parallel/algorithm | 
| std::partition | algorithm | __gnu_parallel::partition | parallel/algorithm | 
| std::random_shuffle | algorithm | __gnu_parallel::random_shuffle | parallel/algorithm | 
| std::set_union | algorithm | __gnu_parallel::set_union | parallel/algorithm | 
| std::set_intersection | algorithm | __gnu_parallel::set_intersection | parallel/algorithm | 
| std::set_symmetric_difference | algorithm | __gnu_parallel::set_symmetric_difference | parallel/algorithm | 
| std::set_difference | algorithm | __gnu_parallel::set_difference | parallel/algorithm | 
| std::sort | algorithm | __gnu_parallel::sort | parallel/algorithm | 
| std::stable_sort | algorithm | __gnu_parallel::stable_sort | parallel/algorithm | 
| std::unique_copy | algorithm | __gnu_parallel::unique_copy | parallel/algorithm |