EDIT TL;DR Anyone who might consider using my code below in production and can afford to require C++-20 standard should rather use std::barrier
as suggested by G. Sliepen in his excellent answer.
I’m working on some OpenMP-parallelized C++ code, that is made of 3 parts with the constraint that no thread should begin part 3 before all threads have finished part 1. But it is perfectly acceptable to have some threads run part 2 while others are running part 1, or to have some threads run part 3 while others are running part 2. (See my question on StackOverflow for details.)
A synchronization barrier anywhere between part 1 and part 3 would satisfy the constraint but there’s no need for such "hard" synchronization. So I thought it would be nice to have a "split" barrier: no thread can pass the second half before all threads have passed the first half.
I managed to implement such a thing with the following code:
class split_barrier {
private:
std::mutex m;
std::condition_variable cv;
int threads_in_section;
int total_threads;
bool may_enter;
bool may_leave;
public:
split_barrier():
threads_in_section(0),
may_enter(false),
may_leave(false)
{}
void init(int threads) {
std::lock_guard<std::mutex> lock(m);
total_threads = threads;
may_enter = true;
}
void enter() {
std::unique_lock lock(m);
cv.wait(lock, [this]{return may_enter;});
if (++threads_in_section == total_threads) {
may_enter = false;
may_leave = true;
lock.unlock();
cv.notify_all();
}
}
void leave() {
std::unique_lock lock(m);
cv.wait(lock, [this]{return may_leave;});
if (--threads_in_section == 0) {
may_leave = false;
may_enter = true;
lock.unlock();
cv.notify_all();
}
}
};
Then my code looks like:
void main() {
split_barrier barrier;
#pragma omp parallel
{
#pragma omp single
barrier.init(omp_get_num_threads());
part1();
barrier.enter();
part2();
barrier.leave();
part3();
}
}
EDIT: Since it does not invalidate the only and accepted answer, I hope I am allowed to add that my "real" use-case looks more like:
void main() {
split_barrier barrier;
#pragma omp parallel
{
#pragma omp single
barrier.init(omp_get_num_threads());
while (...) {
part1();
barrier.enter();
part2();
barrier.leave();
part3();
#pragma omp barrier
part4();
}
}
}
I consider synchronization code to be very error-prone. Is my code thread-safe?
1 Answer 1
Thread safety
Correct use of a single std::mutex
to guard both may_enter
, may_leave
and threads_in_section
. Although there are probably ways to make it more performant by using atomic variables somehow, your code takes a robust approach.
Note that it does depend on the caller using your split_barrier
correctly; the following code will result in a deadlock:
split_barrier barrier; // total_threads == 0
barrier.enter(); // ++threads_in_section != total_threads
foo();
barrier.leave(); // may_leave is still false, so will wait() forever
Use std::barrier
C++ already has a barrier primitive: std::barrier
. It has separate functions arrive()
and wait()
. Using this, your example main()
would look like:
void main() {
std::optional<std::barrier<>> barrier;
#pragma omp parallel
{
#pragma omp single
barrier.emplace(omp_get_num_threads());
part1();
auto arrival_token = barrier->arrive();
part2();
barrier->wait(std::move(arrival_token));
part3();
}
}
The std::optional
is a workaround for the fact that std::barrier
only takes the number of threads in its constructor.
Unnecessary waiting in enter()
Both enter()
and leave()
call wait. This means there are actually two barriers. This happens for example if the barrier is reused:
part1();
barrier.enter();
part2();
barrier.leave(); // Waits for all threads to finish part1()
part3();
barrier.enter(); // Waits for all threads to finish part2()
part4();
barrier.leave(); // Waits for all barriers to finish part3()
part5();
But I would expect the second call to barrier.enter()
to not block anything.
-
\$\begingroup\$ Thanks a lot for pointing me to the
std::barrier
C++-20 standard class, although the doc does not mention theemplace()
method... That’s exactly what I was looking for on StackOverflow. If you have an account there, please consider writing an answer so I can accept it! \$\endgroup\$user2233709– user22337092025年05月23日 07:46:55 +00:00Commented May 23 at 7:46 -
\$\begingroup\$ As for the deadlock if the object in not correctly initialized before it is used, I was aware of that, but considered such a misuse should be considered (or, better, be documented as) undefined behaviour. \$\endgroup\$user2233709– user22337092025年05月23日 07:58:13 +00:00Commented May 23 at 7:58
-
\$\begingroup\$ About the unnecessary waiting, I was also aware of that. In your 5-parts example, I would have used 2 distinct
split_barrier
objects. In my use-case, I have only 4 parts with a hard barrier between parts 3 and 4 and the whole section is enclosed in awhile
loop. Hence, I needed mybarrier
object to be re-usable, but I did not find a way to to allow some threads to enter the barrier in iteration n+1 while others have not left it in iteration n. I also did not care since the hard barrier ensures it cannot happen anyway. \$\endgroup\$user2233709– user22337092025年05月23日 07:59:26 +00:00Commented May 23 at 7:59 -
2\$\begingroup\$ The
emplace()
method comes fromstd::optional
. As for the undefined behaviour, I think it's fine this way (std::barrier
has exactly the same issue), I just wanted to point it out. \$\endgroup\$G. Sliepen– G. Sliepen2025年05月23日 09:12:28 +00:00Commented May 23 at 9:12 -
\$\begingroup\$ Oh, stupid me, I missed the
std::optional
when I read your example code ! \$\endgroup\$user2233709– user22337092025年05月23日 09:29:13 +00:00Commented May 23 at 9:29