generic implementation of median

Question 1

Below is a generic implementation of the summary statistics function Median.

Discussion of efficient use of std::nth_element and std::max_element (for even number of elements only) is from here.

We want the median algorithm to work for ranges of primitive builtin types, as well as complex types. This may require passing in an ordering comparator and an "extraction" functor as lambdas.

Yet we want the algorithm to remain simple to use for simple types.

As a complex type we include a simple vector2 implementation. This is not the focus here.

C++20 has been used for concept constraints and "lambdas in non-evaluated contexts".

Comments, particularly on the API choices for the median(...) appreciated.
Is there a better way specify the default Extract Lambda type?

#include <cmath>
#include <cstdlib>
#include <exception>
#include <functional>
#include <iostream>
#include <ostream>
#include <stdexcept>
#include <utility>
template <typename Numeric>
requires std::is_floating_point_v<Numeric> || std::is_integral_v<Numeric>
class vector2 {
public:
 Numeric x{};
 Numeric y{};
 constexpr vector2(Numeric x_, Numeric y_) : x(x_), y(y_) {}
 constexpr vector2() = default;
 [[nodiscard]] Numeric mag() const { return std::hypot(x, y); }
 [[nodiscard]] vector2 norm() const { return *this / this->mag(); }
 [[nodiscard]] Numeric dot(const vector2& rhs) const { return x * rhs.x + y * rhs.y; }
 // clang-format off
 vector2& operator+=(const vector2& obj) { x += obj.x; y += obj.y; return *this; }
 vector2& operator-=(const vector2& obj) { x -= obj.x; y -= obj.y; return *this; }
 vector2& operator*=(const double& scale) { x *= scale; y *= scale; return *this; }
 vector2& operator/=(const double& scale) { x /= scale; y /= scale; return *this; }
 // clang-format on
 friend vector2 operator+(vector2 lhs, const vector2& rhs) { return lhs += rhs; }
 friend vector2 operator-(vector2 lhs, const vector2& rhs) { return lhs -= rhs; }
 friend vector2 operator*(vector2 lhs, const double& scale) { return lhs *= scale; }
 friend vector2 operator*(const double& scale, vector2 rhs) { return rhs *= scale; }
 friend vector2 operator/(vector2 lhs, const double& scale) { return lhs /= scale; }
 friend std::ostream& operator<<(std::ostream& os, const vector2& v) {
 return os << '[' << v.x << ", " << v.y << "] mag = " << v.mag();
 }
};
using vec2 = vector2<double>;
template <typename RandAccessIter, typename Comp = std::less<>,
 typename Extract = decltype([](const typename RandAccessIter::value_type& e) { return e; })>
auto median(RandAccessIter first, RandAccessIter last, Comp comp = std::less<>(),
 Extract extr = Extract()) {
 auto s = last - first;
 if (s == 0) throw std::domain_error("Can't find median of an empty range");
 auto n = s / 2;
 auto middle = first + n;
 std::nth_element(first, middle, last, comp);
#if !defined(NDEBUG)
 std::cerr << "After nth_element with n= " << n << ":\n";
 std::for_each(first, last, [](const auto& e) { std::cerr << e << "\n"; });
#endif
 if (s % 2 == 1) {
 return extr(*middle);
 }
 auto below_middle = std::max_element(first, middle, comp);
#if !defined(NDEBUG)
 std::cerr << "below_middle:" << *below_middle << "\n";
#endif
 return (extr(*middle) + extr(*below_middle)) / 2.0;
}
// wrap the generic median algorithm for particular type and container
double median(std::vector<vec2> modes) { // intentional copy to protect caller
 return median(
 modes.begin(), modes.end(), [](const auto& a, const auto& b) { return a.mag() < b.mag(); },
 [](const auto& v) { return v.mag(); });
}
int main() {
 try {
 std::cout << "mutable vector of builtin types\n";
 auto vector_of_doubles = std::vector<std::vector<double>>{
 {1.0, 2.0, 3.0, 4.0, 5.0},
 {1.0, 2.0, 3.0, 4.0},
 {1.0, 2.0, 3.0},
 {1.0, 2.0},
 {1.0},
 };
 for (auto& doubles: vector_of_doubles) {
 auto med = median(doubles.begin(), doubles.end()); // call generic algorithm
 std::cout << "median mag = " << med << "\n";
 }
 
 std::cout << "immutable vector of complex types\n";
 const auto vector_of_vecs = std::vector<std::vector<vec2>>{
 {{1, 2}, {3, 4}, {5, 6}, {7, 8}, {9, 10}},
 {{1, 2}, {3, 4}, {5, 6}, {7, 8}},
 {{1, 2}, {3, 4}, {5, 6}},
 {{1, 2}, {3, 4}},
 {{1, 2}},
 {},
 };
 for (const auto& vecs: vector_of_vecs) {
 auto med = median(vecs); //call via simplifying wrapper
 std::cout << "median mag = " << med << "\n";
 }
 } catch (const std::exception& e) {
 std::cerr << "Exception thrown: " << e.what() << " Terminating\n";
 return EXIT_FAILURE;
 }
 return EXIT_SUCCESS;
}

Question 2

It modifies the input

I think it is surprising that an operation that is intended to just give you some statistics back is going to modify the input. In the test code you also test it on const vectors, but there you use the helper function that makes a copy. Maybe make copying part of the real median() function? It might be tempting to use std::partial_sort_copy(), but as you benchmarked, that is always less efficient that just copying the whole input and then using std::nth_element() + std::max_element().

Making a copy here does not increase the algorithmic complexity in any way, but of course there is an overhead. If the caller really wants to avoid that, perhaps provide an in_place_median() or a sorting_median() function?

Provide a version with a ranges interface

Since you target C++20, consider providing an overload that takes a std::ranges::random_access_range as an argument.

Simplify template default parameters

Just like you used std::less<>, you can use std::identity to provide a default value for Extract. Furthermore, when providing default values for the function parameters, you can avoid repeating things by just using {}:

template <typename RandAccessIter, typename Comp = std::less<>,
 typename Extract = std::identity>
auto median(RandAccessIter first, RandAccessIter last,
 Comp comp = {}, Extract extr = {}) {
 ...
}

Use `std::midpoint()`

If you have an even number of elements, then the line of code that averages two values is problematic. Consider that the sum of two values might be outside the range of the value type. There's std::midpoint() which will calculate the average for you while also avoiding all the pitfalls that come with it. (See this presentation from Marshall Clow for the problems one encounters while trying to average two numbers.)

Use `std::invoke()` instead of calling `extr()` directly

To make the code even more generic and to allow member function pointers to be passed for extr, make sure you use std::invoke() to invoke the callable:

if (s % 2 == 1) {
 return std::invoke(extr, *middle);
}
...
return std::midpoint(std::invoke(extr, *middle), std::invoke(extr, *below_middle));

Consider adding a projection parameter

You'll notice that std::ranges::nth_element() has a template parameter Proj that is similar to your Extract, except that it is applied before comparing two elements. It would simplify your helper function std::vector<vec2>, and would allow you to write:

template <typename RandAccessIter, typename Comp = std::less<>,
 typename Proj = std::identity, typename Extract = std::identity>
auto median(RandAccessIter first, RandAccessIter last, Comp comp = {},
 Proj proj = {}, Extract extr = {}) {
 ...
 std::ranges::nth_element(first, middle, last, comp, proj);
 ...
 auto below = std::ranges::max_element(first, middle, comp, proj);
 ...
}
auto median(std::vector<vec2> modes) {
 return median(modes.begin(), modes.end(), {}, &vec::mag, &vec::mag);
}

Use concepts to restrict the allowed template parameter types

In C++20 we can use concepts to restrict template parameters, so type errors are detected early with helpful error messages. If you want to keep using std::nth_element(), then the easiest would be to use std::sortable for RandAccessIter, Comp and Proj, and use std::invocable for Extract:

template <typename RandAccessIter, typename Comp = std::less<>,
 typename Proj = std::identity, typename Extract = std::identity>
requires std::sortable<RandAccessIter, Comp, Proj> &&
 std::invocable<Extract, std::iter_reference_t<RandAccessIter>>
auto median(RandAccessIter first, RandAccessIter last, Comp comp = {},
 Proj proj = {}, Extract extr = {}) {
 ...
}

Question 3

Thanks. Some great points. Should this ` std::nth_element(first, middle, last, comp, proj);` have read ` std::ranges::nth_element(first, middle, last, comp, proj);`?

Question 4

Yes, I did mean that :)

Question 5

OK, after battling with my environment, I am now compiling with gcc-11 rather than clang-13 (the latter is behind the former - even when linking with libstdc++-11). I usually prefer using clang because I use clangd with emacs so it's nice when the lint warning match the compiler warnings. But I have recently learnt that compiling with clang on a gcc system is probably not wise anyway. In any case your approach now works.. and I have now dropped Extract in favour of just Proj. This is much better, but does require the very latest.

Question 6

I also agonised over the copy. It's juts that the standard STL interface is 2 iterators. So you think should still use iterators in the interface (with optional ranges adaptor) and I should use perhaps a "range constructor" in the body of auto median(... to make a copy?

Question 7

It's too bad std::midpoint() restricts things too much here in my opinion. You can average two vec2s, since all the necessary operators are there. Even if it might not make sense for a vec2, consider a type that implements rational numbers.

Question 8

Performance benchmarking on std::nth_element + std::max_element vs std::partial_sort_copy. Refer discussion under G.Sliepen's answer.

Benchmark code

Note that the nth_element version does make a copy, to make a fair comparison.

There are probably some small bugs in the partial_sort_copy code, for tiny ranges - the n-1 code will break. But given the results, I didn't bother improving it.

Note that the partial_sort_copy is also slightly less covenient to use (at least currently) with comp and proj. Again, I didn't bother to improve it due to the results.

#include "benchmark/benchmark.h"
#include <algorithm>
#include <cmath>
#include <cstddef>
#include <cstdlib>
#include <exception>
#include <functional>
#include <iostream>
#include <numeric>
#include <ostream>
#include <random>
#include <ranges>
#include <stdexcept>
#include <utility>
#include <vector>
template <typename Numeric>
requires std::is_floating_point_v<Numeric> || std::is_integral_v<Numeric>
class vector2 {
public:
 Numeric x{};
 Numeric y{};
 constexpr vector2(Numeric x_, Numeric y_) : x(x_), y(y_) {}
 constexpr vector2() = default;
 [[nodiscard]] Numeric mag() const { return std::hypot(x, y); }
 [[nodiscard]] vector2 norm() const { return *this / this->mag(); }
 [[nodiscard]] Numeric dot(const vector2& rhs) const { return x * rhs.x + y * rhs.y; }
 // clang-format off
 vector2& operator+=(const vector2& obj) { x += obj.x; y += obj.y; return *this; }
 vector2& operator-=(const vector2& obj) { x -= obj.x; y -= obj.y; return *this; }
 vector2& operator*=(const double& scale) { x *= scale; y *= scale; return *this; }
 vector2& operator/=(const double& scale) { x /= scale; y /= scale; return *this; }
 // clang-format on
 friend vector2 operator+(vector2 lhs, const vector2& rhs) { return lhs += rhs; }
 friend vector2 operator-(vector2 lhs, const vector2& rhs) { return lhs -= rhs; }
 friend vector2 operator*(vector2 lhs, const double& scale) { return lhs *= scale; }
 friend vector2 operator*(const double& scale, vector2 rhs) { return rhs *= scale; }
 friend vector2 operator/(vector2 lhs, const double& scale) { return lhs /= scale; }
 friend std::ostream& operator<<(std::ostream& os, const vector2& v) {
 return os << '[' << v.x << ", " << v.y << "] mag = " << v.mag();
 }
};
using vec2 = vector2<double>;
template <typename RandAccessIter, typename Comp = std::less<>, typename Proj = std::identity>
auto median(RandAccessIter first, RandAccessIter last, Comp comp = {}, Proj proj = {}) {
 // make a copy
 std::vector<typename RandAccessIter::value_type> tmp(first, last);
 auto tmp_first = tmp.begin();
 auto tmp_last = tmp.end();
 auto s = tmp_last - tmp_first;
 if (s == 0) throw std::domain_error("Can't find median of an empty range.");
 auto n = s / 2;
 auto middle = tmp_first + n;
 std::ranges::nth_element(tmp_first, middle, tmp_last, comp, proj);
#if !defined(NDEBUG)
 std::cerr << "After nth_element with n = " << n << ":\n";
 std::for_each(first, last, [](const auto& e) { std::cerr << e << "\n"; });
#endif
 if (s % 2 == 1) {
 return std::invoke(proj, *middle);
 }
 auto below_middle = std::ranges::max_element(tmp_first, middle, comp, proj);
#if !defined(NDEBUG)
 std::cerr << "below_middle:" << *below_middle << "\n";
#endif
 return std::midpoint(std::invoke(proj, *middle), std::invoke(proj, *below_middle));
}
template <typename RandAccessIter, typename Comp = std::less<>, typename Proj = std::identity>
auto median2(RandAccessIter first, RandAccessIter last, Comp comp = {}, Proj proj = {}) {
 auto s = last - first;
 if (s == 0) throw std::domain_error("Can't find median of an empty range.");
 auto n = static_cast<std::size_t>(s / 2);
 std::vector<typename RandAccessIter::value_type> tmp(n + 1);
 std::ranges::partial_sort_copy(first, last, tmp.begin(), tmp.end(), comp);
 auto middle = &tmp[n];
#if !defined(NDEBUG)
 std::cerr << "middle:" << *middle << "\n";
#endif
 if (s % 2 == 1) {
 return std::invoke(proj, *middle);
 }
 auto below_middle = &tmp[n-1];
#if !defined(NDEBUG)
 std::cerr << "below_middle:" << *below_middle << "\n";
#endif
 return std::midpoint(std::invoke(proj, *middle), std::invoke(proj, *below_middle));
}
void median_nth_element(benchmark::State& state) {
 auto no_elements = static_cast<std::size_t>(state.range(0));
 std::mt19937_64 rgen(1); // NOLINT fixed seed
 std::uniform_int_distribution<std::size_t> dist(0, no_elements - 1);
 std::vector<std::size_t> ints(no_elements);
 std::generate(begin(ints), end(ints), [&dist, &rgen]() { return dist(rgen); });
 for (auto _: state) {
 auto med = median(ints.begin(), ints.end());
 benchmark::DoNotOptimize(med);
 }
}
void median_nth_element_odd(benchmark::State& state) {
 auto no_elements = static_cast<std::size_t>(state.range(0) + 1U);
 std::mt19937_64 rgen(1); // NOLINT fixed seed
 std::uniform_int_distribution<std::size_t> dist(0, no_elements - 1);
 std::vector<std::size_t> ints(no_elements);
 std::generate(begin(ints), end(ints), [&dist, &rgen]() { return dist(rgen); });
 for (auto _: state) {
 auto med = median(ints.begin(), ints.end());
 benchmark::DoNotOptimize(med);
 }
}
void median_nth_element_vec2(benchmark::State& state) {
 auto no_elements = static_cast<std::size_t>(state.range(0));
 std::mt19937_64 rgen(1); // NOLINT fixed seed
 std::uniform_int_distribution<std::size_t> dist(0, no_elements - 1);
 std::vector<vector2<std::size_t>> vecs(no_elements);
 std::generate(begin(vecs), end(vecs), [&dist, &rgen]() { return vector2<std::size_t>{dist(rgen), dist(rgen)}; });
 for (auto _: state) {
 auto med = median(vecs.begin(), vecs.end(), {}, &vector2<std::size_t>::mag);
 benchmark::DoNotOptimize(med);
 }
}
void median_partial_sort(benchmark::State& state) {
 auto no_elements = static_cast<std::size_t>(state.range(0));
 std::mt19937_64 rgen(1); // NOLINT fixed seed
 std::uniform_int_distribution<std::size_t> dist(0, no_elements - 1);
 std::vector<std::size_t> ints(no_elements);
 std::generate(begin(ints), end(ints), [&dist, &rgen]() { return dist(rgen); });
 for (auto _: state) {
 auto med = median2(ints.begin(), ints.end());
 benchmark::DoNotOptimize(med);
 }
}
void median_partial_sort_odd(benchmark::State& state) {
 auto no_elements = static_cast<std::size_t>(state.range(0) + 1U);
 std::mt19937_64 rgen(1); // NOLINT fixed seed
 std::uniform_int_distribution<std::size_t> dist(0, no_elements - 1);
 std::vector<std::size_t> ints(no_elements);
 std::generate(begin(ints), end(ints), [&dist, &rgen]() { return dist(rgen); });
 for (auto _: state) {
 auto med = median2(ints.begin(), ints.end());
 benchmark::DoNotOptimize(med);
 }
}
void median_partial_sort_vec2(benchmark::State& state) {
 auto no_elements = static_cast<std::size_t>(state.range(0));
 std::mt19937_64 rgen(1); // NOLINT fixed seed
 std::uniform_int_distribution<std::size_t> dist(0, no_elements - 1);
 std::vector<vector2<std::size_t>> vecs(no_elements);
 std::generate(begin(vecs), end(vecs), [&dist, &rgen]() { return vector2<std::size_t>{dist(rgen), dist(rgen)}; });
 for (auto _: state) {
 auto med = median2(
 vecs.begin(), vecs.end(), [](const auto& a, const auto& b) { return a.mag() < b.mag(); },
 &vector2<std::size_t>::mag);
 benchmark::DoNotOptimize(med);
 }
}
BENCHMARK(median_nth_element)->Range(8, 8U << 16U);
BENCHMARK(median_nth_element_odd)->Range(8, 8U << 16U);
BENCHMARK(median_nth_element_vec2)->Range(8, 8U << 16U);
BENCHMARK(median_partial_sort)->Range(8, 8U << 16U);
BENCHMARK(median_partial_sort_odd)->Range(8, 8U << 16U);
BENCHMARK(median_partial_sort_vec2)->Range(8, 8U << 16U);
BENCHMARK_MAIN();

Results

Run on (8 X 3800 MHz CPU s)
CPU Caches:
 L1 Data 32 KiB (x4)
 L1 Instruction 32 KiB (x4)
 L2 Unified 256 KiB (x4)
 L3 Unified 8192 KiB (x1)
Load Average: 0.51, 0.41, 0.44
--------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------
median_nth_element/8 49.4 ns 49.4 ns 11726991
median_nth_element/64 162 ns 162 ns 4229863
median_nth_element/512 944 ns 944 ns 749605
median_nth_element/4096 20981 ns 20975 ns 33242
median_nth_element/32768 247038 ns 246926 ns 2888
median_nth_element/262144 2161588 ns 2160627 ns 330
median_nth_element/524288 5109616 ns 5105935 ns 136
median_nth_element_odd/8 42.4 ns 42.3 ns 16568324
median_nth_element_odd/64 195 ns 195 ns 3597718
median_nth_element_odd/512 708 ns 708 ns 908172
median_nth_element_odd/4096 12401 ns 12397 ns 54944
median_nth_element_odd/32768 216881 ns 216798 ns 3323
median_nth_element_odd/262144 2164863 ns 2164004 ns 326
median_nth_element_odd/524288 4792950 ns 4790117 ns 131
median_nth_element_vec2/8 622 ns 621 ns 1128680
median_nth_element_vec2/64 5254 ns 5248 ns 131931
median_nth_element_vec2/512 57524 ns 57445 ns 12203
median_nth_element_vec2/4096 492967 ns 491857 ns 1423
median_nth_element_vec2/32768 5058039 ns 5056358 ns 125
median_nth_element_vec2/262144 34567322 ns 34545641 ns 21
median_nth_element_vec2/524288 56340388 ns 56318226 ns 12
median_partial_sort/8 66.1 ns 66.1 ns 10433344
median_partial_sort/64 596 ns 596 ns 1168221
median_partial_sort/512 6714 ns 6712 ns 102949
median_partial_sort/4096 229300 ns 229164 ns 3096
median_partial_sort/32768 2351233 ns 2350440 ns 300
median_partial_sort/262144 24356189 ns 24339351 ns 29
median_partial_sort/524288 51988540 ns 51970508 ns 13
median_partial_sort_odd/8 65.0 ns 65.0 ns 10840782
median_partial_sort_odd/64 610 ns 610 ns 1166832
median_partial_sort_odd/512 6851 ns 6848 ns 102734
median_partial_sort_odd/4096 227684 ns 227615 ns 3085
median_partial_sort_odd/32768 2347916 ns 2347150 ns 299
median_partial_sort_odd/262144 24591092 ns 24579550 ns 29
median_partial_sort_odd/524288 52963811 ns 52944066 ns 13
median_partial_sort_vec2/8 500 ns 499 ns 1391598
median_partial_sort_vec2/64 9143 ns 9141 ns 75919
median_partial_sort_vec2/512 159484 ns 159447 ns 4387
median_partial_sort_vec2/4096 1910922 ns 1910261 ns 374
median_partial_sort_vec2/32768 18576980 ns 18571146 ns 36
median_partial_sort_vec2/262144 181462960 ns 181402942 ns 4
median_partial_sort_vec2/524288 389011284 ns 388896464 ns 2

Discussion

I didn't spent too much time with this, as it seems thatnth_element + max_element blows partial_sort_copy out of the water for all collection sizes and types.

Even for tiny collections nth_element is faster and as they grow, it is quickly faster by 10x or more.

No contest?

Question 9

No contest indeed, that definitely settles that.

Question 10

Revised Version

Integrating all of the feedback from G.Sliepen plus a bit more.

Always makes a copy now
constrains the inputs with concepts
Simpler syntax for default input parameters
provides an overload which accepts a range
uses a projection parameter - no more extract
uses std::invoke to support member functions
uses std::midpoint for a safer average
Only needs const std::input_iterators because it makes a copy into std::vector. Is there a way to state in the requires clause that we only need const?

I also attempted to provide a trailing return type for median(..) (in order to make the signature more informative), but couldn't make that work.

#include <algorithm>
#include <cmath>
#include <cstddef>
#include <exception>
#include <functional>
#include <iostream>
#include <iterator>
#include <numeric>
#include <ostream>
#include <ranges>
#include <stdexcept>
#include <utility>
#include <vector>
template <typename Numeric>
requires std::floating_point<Numeric> || std::integral<Numeric>
class vector2 {
public:
 Numeric x{};
 Numeric y{};
 constexpr vector2(Numeric x_, Numeric y_) : x(x_), y(y_) {}
 constexpr vector2() = default;
 [[nodiscard]] Numeric mag() const { return std::hypot(x, y); }
 [[nodiscard]] vector2 norm() const { return *this / this->mag(); }
 [[nodiscard]] Numeric dot(const vector2& rhs) const { return x * rhs.x + y * rhs.y; }
 // clang-format off
 vector2& operator+=(const vector2& obj) { x += obj.x; y += obj.y; return *this; }
 vector2& operator-=(const vector2& obj) { x -= obj.x; y -= obj.y; return *this; }
 vector2& operator*=(const double& scale) { x *= scale; y *= scale; return *this; }
 vector2& operator/=(const double& scale) { x /= scale; y /= scale; return *this; }
 // clang-format on
 friend vector2 operator+(vector2 lhs, const vector2& rhs) { return lhs += rhs; }
 friend vector2 operator-(vector2 lhs, const vector2& rhs) { return lhs -= rhs; }
 friend vector2 operator*(vector2 lhs, const double& scale) { return lhs *= scale; }
 friend vector2 operator*(const double& scale, vector2 rhs) { return rhs *= scale; }
 friend vector2 operator/(vector2 lhs, const double& scale) { return lhs /= scale; }
 friend std::ostream& operator<<(std::ostream& os, const vector2& v) {
 return os << '[' << v.x << ", " << v.y << "] mag = " << v.mag();
 }
};
using vec2 = vector2<double>;
template <typename InputIter, typename Comp = std::ranges::less, typename Proj = std::identity>
requires std::input_iterator<InputIter> &&
 std::sortable<typename std::vector<std::iter_value_t<InputIter>>::iterator, Comp, Proj>
auto median(InputIter first, InputIter last, Comp comp = {}, Proj proj = {}) {
 std::vector<std::iter_value_t<InputIter>> tmp(first, last); // make a copy
 auto tmp_first = tmp.begin();
 auto tmp_last = tmp.end();
 auto s = tmp_last - tmp_first;
 if (s == 0) throw std::domain_error("Can't find median of an empty range.");
 auto n = s / 2;
 auto tmp_middle = tmp_first + n;
 std::ranges::nth_element(tmp_first, tmp_middle, tmp_last, comp, proj);
 if (s % 2 == 1) return std::invoke(proj, *tmp_middle);
 auto below_tmp_middle = std::ranges::max_element(tmp_first, tmp_middle, comp, proj);
 return std::midpoint(std::invoke(proj, *tmp_middle), std::invoke(proj, *below_tmp_middle));
}
template <typename InputRange, typename Comp = std::ranges::less, typename Proj = std::identity>
requires std::ranges::input_range<InputRange> &&
 std::sortable<typename std::vector<std::ranges::iterator_t<InputRange>>::iterator, Comp, Proj>
auto median(InputRange range, Comp comp = {}, Proj proj = {}) {
 return median(range.begin(), range.end(), comp, proj);
}
int main() {
 try {
 std::cout << "mutable vectors of builtin types\n";
 auto vectors_of_doubles = std::vector<std::vector<double>>{
 // clang-format off
 {5.0, 4.0, 3.0, 2.0, 1.0},
 {5.0, 4.0, 3.0, 2.0},
 {5.0, 4.0, 3.0},
 {5.0, 4.0},
 {5.0},
 // clang-format on
 };
 for (auto& doubles: vectors_of_doubles) {
 auto med = median(doubles.begin(), doubles.end());
 std::cout << "median = " << med << "\n";
 }
 std::cout << "immutable vectors of complex types\n";
 const auto vectors_of_vecs = std::vector<std::vector<vec2>>{
 {{9, 10}, {7, 8}, {5, 6}, {3, 4}, {1, 2}},
 {{9, 10}, {7, 8}, {5, 6}, {3, 4}},
 {{9, 10}, {7, 8}, {5, 6}},
 {{9, 10}, {7, 8}},
 {{9, 10}},
 {}, // throws exception
 };
 for (const auto& vecs: vectors_of_vecs) {
 auto med = median(vecs, {}, &vec2::mag); // using the range overload with custom Projection
 std::cout << "median mag = " << med << "\n";
 }
 } catch (const std::exception& e) {
 std::cerr << "Exception thrown: " << e.what() << " Terminating\n";
 return EXIT_FAILURE;
 }
 return EXIT_SUCCESS;
}
```

Question 11

You should post this as a new review question instead.

Question 12

@G.Sliepen Thanks. I have now had the chance to do that here: codereview.stackexchange.com/questions/274027/…

G. Sliepen G. Sliepen 69.1k3 gold badges74 silver badges180 bronze badges · Accepted Answer · 2022-02-06 22:49:48Z

It modifies the input

I think it is surprising that an operation that is intended to just give you some statistics back is going to modify the input. In the test code you also test it on const vectors, but there you use the helper function that makes a copy. Maybe make copying part of the real median() function? It might be tempting to use std::partial_sort_copy(), but as you benchmarked, that is always less efficient that just copying the whole input and then using std::nth_element() + std::max_element().

Making a copy here does not increase the algorithmic complexity in any way, but of course there is an overhead. If the caller really wants to avoid that, perhaps provide an in_place_median() or a sorting_median() function?

Provide a version with a ranges interface

Since you target C++20, consider providing an overload that takes a std::ranges::random_access_range as an argument.

Simplify template default parameters

Just like you used std::less<>, you can use std::identity to provide a default value for Extract. Furthermore, when providing default values for the function parameters, you can avoid repeating things by just using {}:

template <typename RandAccessIter, typename Comp = std::less<>,
 typename Extract = std::identity>
auto median(RandAccessIter first, RandAccessIter last,
 Comp comp = {}, Extract extr = {}) {
 ...
}

Use `std::midpoint()`

If you have an even number of elements, then the line of code that averages two values is problematic. Consider that the sum of two values might be outside the range of the value type. There's std::midpoint() which will calculate the average for you while also avoiding all the pitfalls that come with it. (See this presentation from Marshall Clow for the problems one encounters while trying to average two numbers.)

Use `std::invoke()` instead of calling `extr()` directly

To make the code even more generic and to allow member function pointers to be passed for extr, make sure you use std::invoke() to invoke the callable:

if (s % 2 == 1) {
 return std::invoke(extr, *middle);
}
...
return std::midpoint(std::invoke(extr, *middle), std::invoke(extr, *below_middle));

Consider adding a projection parameter

You'll notice that std::ranges::nth_element() has a template parameter Proj that is similar to your Extract, except that it is applied before comparing two elements. It would simplify your helper function std::vector<vec2>, and would allow you to write:

template <typename RandAccessIter, typename Comp = std::less<>,
 typename Proj = std::identity, typename Extract = std::identity>
auto median(RandAccessIter first, RandAccessIter last, Comp comp = {},
 Proj proj = {}, Extract extr = {}) {
 ...
 std::ranges::nth_element(first, middle, last, comp, proj);
 ...
 auto below = std::ranges::max_element(first, middle, comp, proj);
 ...
}
auto median(std::vector<vec2> modes) {
 return median(modes.begin(), modes.end(), {}, &vec::mag, &vec::mag);
}

Use concepts to restrict the allowed template parameter types

In C++20 we can use concepts to restrict template parameters, so type errors are detected early with helpful error messages. If you want to keep using std::nth_element(), then the easiest would be to use std::sortable for RandAccessIter, Comp and Proj, and use std::invocable for Extract:

template <typename RandAccessIter, typename Comp = std::less<>,
 typename Proj = std::identity, typename Extract = std::identity>
requires std::sortable<RandAccessIter, Comp, Proj> &&
 std::invocable<Extract, std::iter_reference_t<RandAccessIter>>
auto median(RandAccessIter first, RandAccessIter last, Comp comp = {},
 Proj proj = {}, Extract extr = {}) {
 ...
}

Thanks. Some great points. Should this ` std::nth_element(first, middle, last, comp, proj);` have read ` std::ranges::nth_element(first, middle, last, comp, proj);`?
OK, after battling with my environment, I am now compiling with gcc-11 rather than clang-13 (the latter is behind the former - even when linking with libstdc++-11). I usually prefer using clang because I use clangd with emacs so it's nice when the lint warning match the compiler warnings. But I have recently learnt that compiling with clang on a gcc system is probably not wise anyway. In any case your approach now works.. and I have now dropped Extract in favour of just Proj. This is much better, but does require the very latest.
I also agonised over the copy. It's juts that the standard STL interface is 2 iterators. So you think should still use iterators in the interface (with optional ranges adaptor) and I should use perhaps a "range constructor" in the body of auto median(... to make a copy?
It's too bad std::midpoint() restricts things too much here in my opinion. You can average two vec2s, since all the necessary operators are there. Even if it might not make sense for a vec2, consider a type that implements rational numbers.

Stack Exchange Network

generic implementation of median

3 Answers 3

It modifies the input

Provide a version with a ranges interface

Simplify template default parameters

Use `std::midpoint()`

Use `std::invoke()` instead of calling `extr()` directly

Consider adding a projection parameter

Use concepts to restrict the allowed template parameter types

Benchmark code

Results

Discussion

Revised Version

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

generic implementation of median

3 Answers 3

It modifies the input

Provide a version with a ranges interface

Simplify template default parameters

Use std::midpoint()

Use std::invoke() instead of calling extr() directly

Consider adding a projection parameter

Use concepts to restrict the allowed template parameter types

Benchmark code

Results

Discussion

Revised Version

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

Use `std::midpoint()`

Use `std::invoke()` instead of calling `extr()` directly