Implement a "split" synchronization barrier for C++ with OpenMP

Question 1

EDIT TL;DR Anyone who might consider using my code below in production and can afford to require C++-20 standard should rather use std::barrier as suggested by G. Sliepen in his excellent answer.

I’m working on some OpenMP-parallelized C++ code, that is made of 3 parts with the constraint that no thread should begin part 3 before all threads have finished part 1. But it is perfectly acceptable to have some threads run part 2 while others are running part 1, or to have some threads run part 3 while others are running part 2. (See my question on StackOverflow for details.)

A synchronization barrier anywhere between part 1 and part 3 would satisfy the constraint but there’s no need for such "hard" synchronization. So I thought it would be nice to have a "split" barrier: no thread can pass the second half before all threads have passed the first half.

I managed to implement such a thing with the following code:

class split_barrier {
private:
 std::mutex m;
 std::condition_variable cv;
 int threads_in_section;
 int total_threads;
 bool may_enter;
 bool may_leave;
public:
 split_barrier():
 threads_in_section(0),
 may_enter(false),
 may_leave(false)
 {}
 void init(int threads) {
 std::lock_guard<std::mutex> lock(m);
 total_threads = threads;
 may_enter = true;
 }
 void enter() {
 std::unique_lock lock(m);
 cv.wait(lock, [this]{return may_enter;});
 if (++threads_in_section == total_threads) {
 may_enter = false;
 may_leave = true;
 lock.unlock();
 cv.notify_all();
 }
 }
 void leave() {
 std::unique_lock lock(m);
 cv.wait(lock, [this]{return may_leave;});
 if (--threads_in_section == 0) {
 may_leave = false;
 may_enter = true;
 lock.unlock();
 cv.notify_all();
 }
 }
};

Then my code looks like:

void main() {
 split_barrier barrier;
 #pragma omp parallel
 {
 #pragma omp single
 barrier.init(omp_get_num_threads());
 part1();
 barrier.enter();
 part2();
 barrier.leave();
 part3();
 }
}

EDIT: Since it does not invalidate the only and accepted answer, I hope I am allowed to add that my "real" use-case looks more like:

void main() {
 split_barrier barrier;
 #pragma omp parallel
 {
 #pragma omp single
 barrier.init(omp_get_num_threads());
 while (...) {
 part1();
 barrier.enter();
 part2();
 barrier.leave();
 part3();
 #pragma omp barrier
 part4();
 }
 }
}

I consider synchronization code to be very error-prone. Is my code thread-safe?

Question 2

Thread safety

Correct use of a single std::mutex to guard both may_enter, may_leave and threads_in_section. Although there are probably ways to make it more performant by using atomic variables somehow, your code takes a robust approach.

Note that it does depend on the caller using your split_barrier correctly; the following code will result in a deadlock:

split_barrier barrier; // total_threads == 0
barrier.enter(); // ++threads_in_section != total_threads
foo();
barrier.leave(); // may_leave is still false, so will wait() forever

Use `std::barrier`

C++ already has a barrier primitive: std::barrier. It has separate functions arrive() and wait(). Using this, your example main() would look like:

void main() {
 std::optional<std::barrier<>> barrier;
 #pragma omp parallel
 {
 #pragma omp single
 barrier.emplace(omp_get_num_threads());
 part1();
 auto arrival_token = barrier->arrive();
 part2();
 barrier->wait(std::move(arrival_token));
 part3();
 }
}

The std::optional is a workaround for the fact that std::barrier only takes the number of threads in its constructor.

Unnecessary waiting in `enter()`

Both enter() and leave() call wait. This means there are actually two barriers. This happens for example if the barrier is reused:

part1();
barrier.enter();
part2();
barrier.leave(); // Waits for all threads to finish part1()
part3();
barrier.enter(); // Waits for all threads to finish part2()
part4();
barrier.leave(); // Waits for all barriers to finish part3()
part5();

But I would expect the second call to barrier.enter() to not block anything.

Question 3

Thanks a lot for pointing me to the std::barrier C++-20 standard class, although the doc does not mention the emplace() method... That’s exactly what I was looking for on StackOverflow. If you have an account there, please consider writing an answer so I can accept it!

Question 4

As for the deadlock if the object in not correctly initialized before it is used, I was aware of that, but considered such a misuse should be considered (or, better, be documented as) undefined behaviour.

Question 5

About the unnecessary waiting, I was also aware of that. In your 5-parts example, I would have used 2 distinct split_barrier objects. In my use-case, I have only 4 parts with a hard barrier between parts 3 and 4 and the whole section is enclosed in a while loop. Hence, I needed my barrier object to be re-usable, but I did not find a way to to allow some threads to enter the barrier in iteration n+1 while others have not left it in iteration n. I also did not care since the hard barrier ensures it cannot happen anyway.

Question 6

The emplace() method comes from std::optional. As for the undefined behaviour, I think it's fine this way (std::barrier has exactly the same issue), I just wanted to point it out.

Question 7

Oh, stupid me, I missed the std::optional when I read your example code !

G. Sliepen G. Sliepen 68.7k3 gold badges74 silver badges179 bronze badges · Accepted Answer · 2025-05-22 18:25:39Z

Thread safety

Correct use of a single std::mutex to guard both may_enter, may_leave and threads_in_section. Although there are probably ways to make it more performant by using atomic variables somehow, your code takes a robust approach.

Note that it does depend on the caller using your split_barrier correctly; the following code will result in a deadlock:

split_barrier barrier; // total_threads == 0
barrier.enter(); // ++threads_in_section != total_threads
foo();
barrier.leave(); // may_leave is still false, so will wait() forever

Use `std::barrier`

C++ already has a barrier primitive: std::barrier. It has separate functions arrive() and wait(). Using this, your example main() would look like:

void main() {
 std::optional<std::barrier<>> barrier;
 #pragma omp parallel
 {
 #pragma omp single
 barrier.emplace(omp_get_num_threads());
 part1();
 auto arrival_token = barrier->arrive();
 part2();
 barrier->wait(std::move(arrival_token));
 part3();
 }
}

The std::optional is a workaround for the fact that std::barrier only takes the number of threads in its constructor.

Unnecessary waiting in `enter()`

Both enter() and leave() call wait. This means there are actually two barriers. This happens for example if the barrier is reused:

part1();
barrier.enter();
part2();
barrier.leave(); // Waits for all threads to finish part1()
part3();
barrier.enter(); // Waits for all threads to finish part2()
part4();
barrier.leave(); // Waits for all barriers to finish part3()
part5();

But I would expect the second call to barrier.enter() to not block anything.

Thanks a lot for pointing me to the std::barrier C++-20 standard class, although the doc does not mention the emplace() method... That’s exactly what I was looking for on StackOverflow. If you have an account there, please consider writing an answer so I can accept it!
As for the deadlock if the object in not correctly initialized before it is used, I was aware of that, but considered such a misuse should be considered (or, better, be documented as) undefined behaviour.
About the unnecessary waiting, I was also aware of that. In your 5-parts example, I would have used 2 distinct split_barrier objects. In my use-case, I have only 4 parts with a hard barrier between parts 3 and 4 and the whole section is enclosed in a while loop. Hence, I needed my barrier object to be re-usable, but I did not find a way to to allow some threads to enter the barrier in iteration n+1 while others have not left it in iteration n. I also did not care since the hard barrier ensures it cannot happen anyway.
The emplace() method comes from std::optional. As for the undefined behaviour, I think it's fine this way (std::barrier has exactly the same issue), I just wanted to point it out.
Oh, stupid me, I missed the std::optional when I read your example code !

Stack Exchange Network

Implement a "split" synchronization barrier for C++ with OpenMP

1 Answer 1

Thread safety

Use `std::barrier`

Unnecessary waiting in `enter()`

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Implement a "split" synchronization barrier for C++ with OpenMP

1 Answer 1

Thread safety

Use std::barrier

Unnecessary waiting in enter()

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Use `std::barrier`

Unnecessary waiting in `enter()`