Compute "negative" version of vector of vectors

Question 1

Given a std::vector<std::vector<double>> (e.g., representing some points in a space with D dimensions), I want to compute a new std::vector<std::vector<double>> where each double is the negative of the original, as in the following MWE:

#define FMT_HEADER_ONLY
#include <fmt/ranges.h>
#include <vector>
#define D 3
std::vector<std::vector<double>> negative_points(const std::vector<std::vector<double>> &points) {
 std::vector<std::vector<double>> negative(points.size(), std::vector<double>(D));
 for (std::size_t i = 0; i < points.size(); ++i) {
 for (std::size_t j = 0; j < D; ++j) {
 negative[i][j] = -points[i][j];
 }
 }
 return negative;
}
int main(int argc, char *argv[]) {
 std::vector<std::vector<double>> points{
 {0, 0, 4},
 {0, 5, 3},
 {1, 7, 0},
 {2, 1, 4},
 {3, 4, 5},
 {4, 2, 3},
 {4, 4, 6},
 {4, 6, 7},
 {5, 0, 2},
 {6, 4, 1},
 {6, 5, 1},
 {6, 7, 0},
 {7, 4, 3}
 };
 fmt::print("Original: {}\n", points);
 fmt::print("Negative: {}\n", negative_points(points));
 return EXIT_SUCCESS;
}

Here I obtain the result with index-based loops, is there a better way to do this? Here's an attempt with std functions, which I nonetheless find more difficult to understand:

std::vector<std::vector<double>> negative_points_std(const std::vector<std::vector<double>> &points) {
 std::vector<std::vector<double>> negative(points.size());
 std::transform(points.begin(), points.end(), negative.begin(),
 [](const std::vector<double> &point) {
 std::vector<double> neg_point(point.size());
 std::transform(point.begin(), point.end(), neg_point.begin(),
 [](double x) { return -x; });
 return neg_point;
 });
 return negative;
}

Question 2

Make it more generic

This works specifically for two-dimensional nested std::vectors of doubles, and will only negate the elements. What if you have a three-dimensional container, and want to take the square root, and want the results to be in a nested std::vector of std::complex<double> so you can handle negative input values?

Ideally, you want something like std::transform() that automatically recurses nested containers, and that returns a copy instead of modifying in-place. This can be done, and user JimmyHu explored this in great detail. See for example his recursive_transform() implementation that works on nested std::vectors. In my answer there, I showed how to make it even more generic and work on any type of container.

With a recursive transform function like that, you would be able to write:

int main() {
 std::vector<std::vector<double>> points{
 {0, 0, 4},
 {0, 5, 3},
 {1, 7, 0},
 ...
 };
 fmt::print("Original: {}\n", points);
 fmt::print("Negative: {}\n", recursive_transform(points, std::negate{}));
 return EXIT_SUCCESS;
}

Maybe you want to use a library for vector types

If you specifically want to transform 3D coordinates, then instead of using a std::vector<double> to represent a point, you might want to use another type. For example, you could use GLM, and write:

int main() {
 std::vector<glm::dvec3> points{
 {0, 0, 4},
 {0, 5, 3},
 {1, 7, 0},
 ...
 };
 auto negated = points;
 std::transform(negated.begin(), negated.end(), std::negate{});
 fmt::print("Original: {}\n", points);
 fmt::print("Negative: {}\n", negated);
 return EXIT_SUCCESS;
}

This will be much more efficient; a glm::dvec3 is like a struct holding 3 doubles, so all the points will be stored compactly in memory, whereas with nested std::vectors, each inner std::vector will do its own memory allocation, with no guarantee that they will all be sequential in memory, not to mention the overhead of doing the allocation and keeping track of it for every point.

Question 3

They both mostly look fine. LGTM, ship!

You didn't post timing figures for the two versions.

The index-based version suffers from this weakness:

#define D 3

The big advantage of the second version

std::transform(point.begin(), point.end(), neg_point.begin(), ...

is it asks "how many dimensions does a point have?" at run time. Surely the first version could learn this trick.

C programmers expect that caller might allocate and be responsible for object lifetime. Following C++'s somewhat different conventions here makes perfect sense.

If there was the ability for caller to specify the pre-allocated destination vector, then we might examine whether passing in same vector twice results in correct overwriting behavior.

The memory access pattern is same as memcpy. Both versions are friendly to caches, branch prediction, loop unrolling, and vectorization.

Question 4

I'm not sure I get why specifying the number of dimensions at compile time is a "weakness"? Is it because the code is less "flexible"? Shouldn't this (potentially) enable more compile-time optimizations since the size is known?

Question 5

Right, flexibility, it prohibits a mixture of 2-D and 3-D vectors, and burdens the caller with getting it properly defined at compile time. But yeah, the optimization aspect could be very attractive. Which brings us back to publishing head-to-head benchmark timings, looking at generated code on godbolt.org , that sort of thing.

Question 6

"The memory access pattern is same as memcpy." Within one inner std::vector, sure, but not for the std::vector<std::vector<>> as a whole. If you only store three values per point, it will be quite bad.

Question 7

Ok, let's lean on that item. What I had in mind is the L3 layer will issue sequential cache-line requests to DRAM, same as memcpy, so DRAM stays happy and fast. And if Point is cache-line aligned there will be padding holes, fine, we lose a fraction of the bandwidth. But if you see another pattern, please describe it! I'm especially interested in description of "quite bad", and how to mitigate it. Also, I think we're coming back to the old business of OP may want to publish benchmark timings for different values of D, including odd values.

Question 8

Just consider that a std::vector<std::vector<double>> is a pointer to an array of pointers to arrays of doubles. Each inner std::vector has separately allocated its array of doubles. There is no guarantee those are sequential in memory. Each std::vector also stores a size and capacity, the memory allocator also needs to store some metadata for each allocation, so the size overhead for each inner std::vector of 3 doubles is quite significant, more than a factor 2! Only if you are lucky and everything is sequential in memory can things be prefetched.

Question 9

Include the Required Library Headers

You use both std::size_t and EXIT_SUCCESS in your program, so you should include <cstdlib>. These identifiers might also be declared as a side-effect of another header file, but this is not portable.

Use `constexpr`, not Macros

More precisely, use constexpr when you can, and macros when you have to. Preprocessor macros only do lexical substitution, which can have many subtle errors and be extremely hard to debug. A constexpr expression (or something less restrictive, like consteval or inline) gets you something type-safe and scoped that follows the syntax of the language.

Use Efficient Data Structures

You have a rectangular array here: every row is the same constant width. A vector of vector is an extremely inefficient data structure to represent this. In fact, it’s almost never the data structure you really want.

There are some other recommendations in other answers, but one very quick solution that duck-types perfectly to the code you’ve already written is:

static constexpr std::size_t D = 3;
using row=std::array< double, D >;
using point_matrix = std::vector<row>;

(If you genuinely do have a sparse matrix where being able to omit trailing empty columns matters, instead of fixed-width columns, the format you want is usually something like compressed sparse row.)

Let’s take a look at the generated assembly output for one of the inner loops, from Clang 15.0.0 with -std=c++20 -march=x86-64-v3 -O3 when we use a std::vector of fixed-size arrays:

.LBB0_25: # =>This Inner Loop Header: Depth=1
 vxorps ymm1, ymm0, ymmword ptr [rax + rdi]
 vxorps ymm2, ymm0, ymmword ptr [rax + rdi + 32]
 vxorps ymm3, ymm0, ymmword ptr [rax + rdi + 64]
 vmovups ymmword ptr [rdx + rdi + 64], ymm3
 vmovups ymmword ptr [rdx + rdi + 32], ymm2
 vmovups ymmword ptr [rdx + rdi], ymm1
 add rdi, 96
 cmp r8, rdi
 jne .LBB0_25

With dynamic vectors as rows:

Each row must allocate its own dynamic memory.
The negative object must initialize every row, move and delete them.
The negative_points function must dereference every row and loop over it individually, with a cache miss each time.
All lookups require double indirection.

With fixed-width arrays as rows:

Allocating a new object requires a single call to the new handler, and with copy elision, delete can be optimized away.
The loop can be a single linear pass through the contiguous storage, with every row but the first preloaded.
Accessing the storage requires a single dereference.

If the rows were larger, there would theoretically be one advantage to making the rows vectors: they would then be moveable and swappable. With rows this small, the overhead of copying one is no greater than of moving a vector.

Consider a More Flexible Approach

Since people here have suggested at least three different data structures, there’s a good chance you might want to change the data structure you use. I picked one that duck-typed to your existing code, but some of the other suggestions don’t.

In the abstract, what you want to do has a few different names in computer science, and one from functional programming is traverse. That is, you want to take some kind of structure (or, more abstractly, a functor from category theory) that wraps elements of a type, here double, pass it a function that takes that type as its input, and return a new structure with the same layout, but where all the data has been passed through the function.

Unfortunately, deducing the correct template parameters for arbitrarily-nested containers would be somewhere between very complicated and impossible, but you can at least generalize on the transformation function and overload the traversal. This lets you write:

inline point_matrix negative_points(const point_matrix &points) {
 return traverse( points, std::negate<double>{} );
}

Putting it All Together

Here’s a version, based on this code by G. Sliepen, that should work for nested containers. It replaces my original implementation.

#include <algorithm> // transform
#include <array>
#include <cassert>
#include <concepts> // std::invocable
#include <cstdlib>
#include <fmt/ranges.h>
#include <functional> // invoke, negate
#include <iterator> // begin, end
#include <vector>
static constexpr std::size_t D = 3;
using row_t = std::array< double, D >;
using point_matrix = std::vector<row_t>;
template<class T>
concept is_iterable = requires(T x)
{
 *std::begin(x);
 std::end(x);
};
template<class T>
concept is_sized = requires(T x)
{
 x.size();
};
template<class T>
concept is_reservable = requires( T x, std::size_t n )
{
 x.reserve(n);
};
template<class T>
concept is_resizeable = requires( T x, std::size_t n )
{
 x.resize(n);
};
template<class T, class U>
concept is_back_emplaceable = requires( T x, U y )
{
 x.emplace_back(y);
};
template<class T, class U>
concept is_set_emplaceable = requires( T x, U y )
{
 x.emplace_hint( std::begin(x), y );
};
template<class T, class U>
concept is_forward_list_emplaceable = requires( T x, U y )
{
 x.emplace_after( std::begin(x), y );
};
template<class T, class U>
concept is_basic_emplaceable = requires( T x, U y )
{
 x.emplace(y);
};
// Prototypes in case we ever want to nest in the opposite order.
template< template<typename...> typename Container,
 typename F,
 typename... Ts >
 requires is_iterable<Container<Ts...>>
inline auto traverse ( const Container<Ts...>& input, const F& f );
template< template<class T, std::size_t> class Container,
 typename F,
 typename T,
 std::size_t N >
 requires is_iterable<Container<T, N>>
constexpr auto traverse( const Container<T, N>& input, const F& f );
template< typename T,
 typename F,
 std::size_t N >
constexpr auto traverse( const T(&input)[N], const F& f );
/* Base case for traversal: a single input to the transformation function.
 */
template< typename T,
 std::invocable<T> F >
constexpr auto traverse( const T& input, const F& f )
{
 return std::invoke( f, input );
}
/* This overload was needed to support traversing std::array.
 */
template< template<class T, std::size_t> class Container,
 typename F,
 typename T,
 std::size_t N >
 requires is_iterable<Container<T, N>>
constexpr auto traverse( const Container<T, N>& input, const F& f )
{
 using TransformedValueType = decltype(traverse(*std::begin(input), f));
 Container<TransformedValueType, N> output;
 std::transform( std::begin(input),
 std::end(input),
 std::begin(output),
 [f](auto &x){ return traverse( x, f ); }
 );
 return output;
}
/* Since C++ does not allow functions to return built-in arrays, we transform
 * them into std::array objects instead.
 */
template< typename T,
 typename F,
 std::size_t N >
constexpr auto traverse( const T(&input)[N], const F& f )
{
 using TransformedValueType = decltype(traverse(*std::begin(input), f));
 std::array< TransformedValueType, N > output;
 std::transform( std::begin(input),
 std::end(input),
 std::begin(output),
 [f](auto &x){ return traverse( x, f ); }
 );
 return output;
}
/* This implementation looks for the following member functions on the
 * transformed container, in order: .emplace_back() (std::vector,
 * std::dequeue, std::list), .emplace_after() (std::forward_list),
 * emplace_hint (std::set, etc.), or .emplace(). These will fail on
 * containers such as std::map, unless another overload enables traversal on
 * key-value pairs. If none of those succeeds, it will attempt to resize the
 * output container and call std::transform().
 */
template< template<typename...> typename Container,
 typename F,
 typename... Ts >
 requires is_iterable<Container<Ts...>>
inline auto traverse ( const Container<Ts...>& input, const F& f )
{
 using TransformedValueType = decltype(traverse(*std::begin(input), f));
 using OutputC = Container<TransformedValueType>;
 OutputC output;
 // Should reserve storage if and only if the container supported.
 if constexpr ( is_sized<Container<Ts...>> &&
 is_reservable<Container<TransformedValueType>> ) {
 output.reserve(input.size());
 }
 if constexpr (is_back_emplaceable<OutputC, TransformedValueType>) {
 for( const auto& x : input ) {
 output.emplace_back(traverse( x, f ));
 }
 } else if constexpr (is_forward_list_emplaceable<OutputC, TransformedValueType>) {
 // This branch is NOT TESTED.
 auto current = std::begin(output);
 for( auto it = std::begin(input); it != std::end(input); ++it ) {
 current = output.emplace_after( current, traverse( *it, f ) );
 }
 } else if constexpr (is_set_emplaceable<OutputC, TransformedValueType>) {
 // This branch is NOT TESTED.
 auto current = std::begin(output);
 for( auto it = std::begin(input); it != std::end(input); ++it ) {
 current = output.emplace_hint( current, traverse( *it, f ) );
 }
 } else if constexpr (is_basic_emplaceable<OutputC, TransformedValueType>) {
 // This branch is NOT TESTED.
 for ( const auto& x : input ) {
 output.emplace(traverse( x, f ));
 }
 } else {
 if constexpr ( is_sized<Container<Ts...>> &&
 is_resizeable<OutputC>) {
 output.resize(input.size());
 }
 std::transform( std::begin(input),
 std::end(input),
 std::begin(output),
 [f](const auto& x){ return traverse( x, f ); }
 );
 }
 // One last sanity check.
 if constexpr( is_sized<Container<Ts...>> && is_sized<OutputC> ) {
 assert( output.size() == input.size() );
 }
 return output;
}
auto negative_points(const point_matrix &points) {
 return traverse( points, std::negate<double>{} );
}
int main() {
 const point_matrix points{
 {0, 0, 4},
 {0, 5, 3},
 {1, 7, 0},
 {2, 1, 4},
 {3, 4, 5},
 {4, 2, 3},
 {4, 4, 6},
 {4, 6, 7},
 {5, 0, 2},
 {6, 4, 1},
 {6, 5, 1},
 {6, 7, 0},
 {7, 4, 3}
 };
 fmt::print("Original: {}\n", points);
 fmt::print("Negative: {}\n", negative_points(points));
 return EXIT_SUCCESS;
}

In the code generated by Clang 15.0.0 with -std=c++20 -march=x86-64-v3 -O3, there is one call to memmove that might be optimized away, compared to just duck-typing your original code to the new types. But it’s still reasonably efficient, very general (and sadly overcomplicated).

Question 10

I don't know if you'd call JimmyHu's recursive_transform() overengineered, but it's less lines of code than your two traverse() overloads. You just unrolled the recursion :) Also, I've shown that it is certainly possible to deduce the types of nested containers, and it isn't even that hard (at least not once you know how template template parameters work).

Question 11

Very nice! I had not realized that the recursive_transform() implementation wouldn't work on std::arrays, or at least it does "work" but it then treats the whole array as a value instead of a container. Adding support for reserve() is great as well.

Question 12

Weird, let's hope the future will indeed be more optimal. I also checked and it should be possible to copy all but the first template parameter of the input container over to the new one, so if you have a container with custom allocators, those will be used as well for the result. But maybe this should become a new question on Code Review.

Question 13

@G.Sliepen Actually, it was because the original implementation used .push_back(), which is expensive on std::array (and fails on many outer STL containers). I just posted a new version that emplaces instead, optimizing much better here. It might—although I have not tested—also work for many other containers that did not before, including std::forward_list and std::set. It still won’t work for std::map.

Question 14

@G.Sliepen I think at this point I might have failed my goal of not making this "ridiculously over-engineered."

Question 15

As @J_H said you need to time both executions and you need to examine the generated assembly code if you are worrying about performance. You can do this in the same program.

The reason to do the bench marks and examine the resultant assembly code is that while both versions may generate the same code if compiled -O3 they probably don't generate the same code if they are compiled using other optimization levels or completely un-optimized. There are times where you can't use optimization or the best optimization and you still want the code that performs the best (embedded programming is one place where you may not be able to optimize the code).

Rather than visually examine the output, provide expected output in the test program and compare that output within the program. You can then use a test framework to indicate pass or fail for each of the methods.

Some programmers would consider the first method too C like, the second method is more idiomatic for C++ although I do agree with you about readability.

Question 16

Most of my observations apply to both versions of the code:

Use a proper "Point" type instead of std::vector<double>. And make reflecting a point be a named function.
Does it make sense to have a collection of points of differing dimensions? Judging by the first version, it doesn't, so we could have a function templated on the number of dimensions.
Consider passing the input by value, then modifying that value in-place.
main() isn't using the arguments, so use the no-args signature.
Reaching the end of main() is equivalent to returning EXIT_SUCCESS, so that can be omitted.

Specific to first version:

Instead of a preprocessor macro, use a proper typed constant for D:
```
 static constexpr std::size_t D = 3;
```

Specific to second version:

We need to #include <algorithm> for std::transform() to be declared.
Consider using the Ranges version of the algorithm for simpler code.
We can use std::negate (from <functional>) instead of writing the transformation [](double x) { return -x; }.

Suggested code:

#include <algorithm>
#include <functional>
#include <vector>
using Point3d = std::vector<double>;
Point3d reflected(Point3d p)
{
 std::ranges::transform(p, p.begin(), std::negate<>{});
 return p;
}
std::vector<Point3d> negative_points(std::vector<Point3d> points)
{
 std::ranges::transform(points, points.begin(), reflected);
 return points;
}

#define FMT_HEADER_ONLY
#include <fmt/ranges.h>
int main()
{
 std::vector<Point3d> points{
 {{0, 0, 4}},
 {{0, 5, 3}},
 {{1, 7, 0}},
 {{2, 1, 4}},
 {{3, 4, 5}},
 {{4, 2, 3}},
 {{4, 4, 6}},
 {{4, 6, -7}},
 {{-5, 0, -2}},
 {{-6, -4, -1}},
 {{6, 5, 1}},
 {{6, 7, 0}},
 {{7, 4, 3}},
 };
 fmt::print("Original: {}\n", points);
 fmt::print("Negative: {}\n", negative_points(points));
}

With this code, we can easily change to using a different Point representation, with no changes anywhere else:

#include <array>
using Point3d = std::array<double,3>;

Question 17

I don't understand the hardcoded number of dimensions (ala. the #define D 3 )

Consider just resizing each row of negative inline.

std::vector<std::vector<double>> negative_points(const std::vector<std::vector<double>> &points) {
 std::vector<std::vector<double>> negative(points.size());
 for (std::size_t i = 0; i < points.size(); ++i) {
 negative[i].resize(points[i].size());
 for (std::size_t j = 0; j < points[i].size(); ++j) {
 negative[i][j] = -points[i][j];
 }
 }
 return negative;
}

There will of course be folks that say, "just use std::transform". But I also agree with that this sort of terse code writing can be very difficult to read and maintain. The performance delta of using a pair of nested for-loops is probably negligible when compiler optimizations are applied.

score 8 · Answer 1 · 2023-02-05 22:20:36Z

Make it more generic

This works specifically for two-dimensional nested std::vectors of doubles, and will only negate the elements. What if you have a three-dimensional container, and want to take the square root, and want the results to be in a nested std::vector of std::complex<double> so you can handle negative input values?

Ideally, you want something like std::transform() that automatically recurses nested containers, and that returns a copy instead of modifying in-place. This can be done, and user JimmyHu explored this in great detail. See for example his recursive_transform() implementation that works on nested std::vectors. In my answer there, I showed how to make it even more generic and work on any type of container.

With a recursive transform function like that, you would be able to write:

int main() {
 std::vector<std::vector<double>> points{
 {0, 0, 4},
 {0, 5, 3},
 {1, 7, 0},
 ...
 };
 fmt::print("Original: {}\n", points);
 fmt::print("Negative: {}\n", recursive_transform(points, std::negate{}));
 return EXIT_SUCCESS;
}

Maybe you want to use a library for vector types

If you specifically want to transform 3D coordinates, then instead of using a std::vector<double> to represent a point, you might want to use another type. For example, you could use GLM, and write:

int main() {
 std::vector<glm::dvec3> points{
 {0, 0, 4},
 {0, 5, 3},
 {1, 7, 0},
 ...
 };
 auto negated = points;
 std::transform(negated.begin(), negated.end(), std::negate{});
 fmt::print("Original: {}\n", points);
 fmt::print("Negative: {}\n", negated);
 return EXIT_SUCCESS;
}

This will be much more efficient; a glm::dvec3 is like a struct holding 3 doubles, so all the points will be stored compactly in memory, whereas with nested std::vectors, each inner std::vector will do its own memory allocation, with no guarantee that they will all be sequential in memory, not to mention the overhead of doing the allocation and keeping track of it for every point.

J_H J_HJ_H 41.4k3 gold badges38 silver badges157 bronze badges · Answer 2 · 2023-02-05 13:46:19Z

4

\$\begingroup\$

They both mostly look fine. LGTM, ship!

You didn't post timing figures for the two versions.

The index-based version suffers from this weakness:

#define D 3

The big advantage of the second version

std::transform(point.begin(), point.end(), neg_point.begin(), ...

is it asks "how many dimensions does a point have?" at run time. Surely the first version could learn this trick.

C programmers expect that caller might allocate and be responsible for object lifetime. Following C++'s somewhat different conventions here makes perfect sense.

If there was the ability for caller to specify the pre-allocated destination vector, then we might examine whether passing in same vector twice results in correct overwriting behavior.

The memory access pattern is same as memcpy. Both versions are friendly to caches, branch prediction, loop unrolling, and vectorization.

Share

answered Feb 5, 2023 at 13:46

J_H's user avatar

J_H J_HJ_H

41.4k3 gold badges38 silver badges157 bronze badges

\$\endgroup\$

5

\$\begingroup\$ I'm not sure I get why specifying the number of dimensions at compile time is a "weakness"? Is it because the code is less "flexible"? Shouldn't this (potentially) enable more compile-time optimizations since the size is known? \$\endgroup\$

Filippo Bistaffa
– Filippo Bistaffa

2023年02月05日 13:49:12 +00:00
Commented Feb 5, 2023 at 13:49
1

\$\begingroup\$ Right, flexibility, it prohibits a mixture of 2-D and 3-D vectors, and burdens the caller with getting it properly defined at compile time. But yeah, the optimization aspect could be very attractive. Which brings us back to publishing head-to-head benchmark timings, looking at generated code on godbolt.org , that sort of thing. \$\endgroup\$

J_H
– J_H

2023年02月05日 13:59:26 +00:00
Commented Feb 5, 2023 at 13:59
\$\begingroup\$ "The memory access pattern is same as memcpy." Within one inner std::vector, sure, but not for the std::vector<std::vector<>> as a whole. If you only store three values per point, it will be quite bad. \$\endgroup\$

G. Sliepen
– G. Sliepen

2023年02月05日 22:36:02 +00:00
Commented Feb 5, 2023 at 22:36
\$\begingroup\$ Ok, let's lean on that item. What I had in mind is the L3 layer will issue sequential cache-line requests to DRAM, same as memcpy, so DRAM stays happy and fast. And if Point is cache-line aligned there will be padding holes, fine, we lose a fraction of the bandwidth. But if you see another pattern, please describe it! I'm especially interested in description of "quite bad", and how to mitigate it. Also, I think we're coming back to the old business of OP may want to publish benchmark timings for different values of D, including odd values. \$\endgroup\$

J_H
– J_H

2023年02月05日 22:53:42 +00:00
Commented Feb 5, 2023 at 22:53
\$\begingroup\$ Just consider that a std::vector<std::vector<double>> is a pointer to an array of pointers to arrays of doubles. Each inner std::vector has separately allocated its array of doubles. There is no guarantee those are sequential in memory. Each std::vector also stores a size and capacity, the memory allocator also needs to store some metadata for each allocation, so the size overhead for each inner std::vector of 3 doubles is quite significant, more than a factor 2! Only if you are lucky and everything is sequential in memory can things be prefetched. \$\endgroup\$

G. Sliepen
– G. Sliepen

2023年02月06日 08:22:10 +00:00
Commented Feb 6, 2023 at 8:22

Add a comment |

Davislor DavislorDavislor 9,10519 silver badges39 bronze badges · Answer 3 · 2023-02-07 18:58:36Z

Include the Required Library Headers

You use both std::size_t and EXIT_SUCCESS in your program, so you should include <cstdlib>. These identifiers might also be declared as a side-effect of another header file, but this is not portable.

Use `constexpr`, not Macros

More precisely, use constexpr when you can, and macros when you have to. Preprocessor macros only do lexical substitution, which can have many subtle errors and be extremely hard to debug. A constexpr expression (or something less restrictive, like consteval or inline) gets you something type-safe and scoped that follows the syntax of the language.

Use Efficient Data Structures

You have a rectangular array here: every row is the same constant width. A vector of vector is an extremely inefficient data structure to represent this. In fact, it’s almost never the data structure you really want.

There are some other recommendations in other answers, but one very quick solution that duck-types perfectly to the code you’ve already written is:

static constexpr std::size_t D = 3;
using row=std::array< double, D >;
using point_matrix = std::vector<row>;

(If you genuinely do have a sparse matrix where being able to omit trailing empty columns matters, instead of fixed-width columns, the format you want is usually something like compressed sparse row.)

Let’s take a look at the generated assembly output for one of the inner loops, from Clang 15.0.0 with -std=c++20 -march=x86-64-v3 -O3 when we use a std::vector of fixed-size arrays:

.LBB0_25: # =>This Inner Loop Header: Depth=1
 vxorps ymm1, ymm0, ymmword ptr [rax + rdi]
 vxorps ymm2, ymm0, ymmword ptr [rax + rdi + 32]
 vxorps ymm3, ymm0, ymmword ptr [rax + rdi + 64]
 vmovups ymmword ptr [rdx + rdi + 64], ymm3
 vmovups ymmword ptr [rdx + rdi + 32], ymm2
 vmovups ymmword ptr [rdx + rdi], ymm1
 add rdi, 96
 cmp r8, rdi
 jne .LBB0_25

With dynamic vectors as rows:

Each row must allocate its own dynamic memory.
The negative object must initialize every row, move and delete them.
The negative_points function must dereference every row and loop over it individually, with a cache miss each time.
All lookups require double indirection.

With fixed-width arrays as rows:

Allocating a new object requires a single call to the new handler, and with copy elision, delete can be optimized away.
The loop can be a single linear pass through the contiguous storage, with every row but the first preloaded.
Accessing the storage requires a single dereference.

If the rows were larger, there would theoretically be one advantage to making the rows vectors: they would then be moveable and swappable. With rows this small, the overhead of copying one is no greater than of moving a vector.

Consider a More Flexible Approach

Since people here have suggested at least three different data structures, there’s a good chance you might want to change the data structure you use. I picked one that duck-typed to your existing code, but some of the other suggestions don’t.

In the abstract, what you want to do has a few different names in computer science, and one from functional programming is traverse. That is, you want to take some kind of structure (or, more abstractly, a functor from category theory) that wraps elements of a type, here double, pass it a function that takes that type as its input, and return a new structure with the same layout, but where all the data has been passed through the function.

Unfortunately, deducing the correct template parameters for arbitrarily-nested containers would be somewhere between very complicated and impossible, but you can at least generalize on the transformation function and overload the traversal. This lets you write:

inline point_matrix negative_points(const point_matrix &points) {
 return traverse( points, std::negate<double>{} );
}

Putting it All Together

Here’s a version, based on this code by G. Sliepen, that should work for nested containers. It replaces my original implementation.

#include <algorithm> // transform
#include <array>
#include <cassert>
#include <concepts> // std::invocable
#include <cstdlib>
#include <fmt/ranges.h>
#include <functional> // invoke, negate
#include <iterator> // begin, end
#include <vector>
static constexpr std::size_t D = 3;
using row_t = std::array< double, D >;
using point_matrix = std::vector<row_t>;
template<class T>
concept is_iterable = requires(T x)
{
 *std::begin(x);
 std::end(x);
};
template<class T>
concept is_sized = requires(T x)
{
 x.size();
};
template<class T>
concept is_reservable = requires( T x, std::size_t n )
{
 x.reserve(n);
};
template<class T>
concept is_resizeable = requires( T x, std::size_t n )
{
 x.resize(n);
};
template<class T, class U>
concept is_back_emplaceable = requires( T x, U y )
{
 x.emplace_back(y);
};
template<class T, class U>
concept is_set_emplaceable = requires( T x, U y )
{
 x.emplace_hint( std::begin(x), y );
};
template<class T, class U>
concept is_forward_list_emplaceable = requires( T x, U y )
{
 x.emplace_after( std::begin(x), y );
};
template<class T, class U>
concept is_basic_emplaceable = requires( T x, U y )
{
 x.emplace(y);
};
// Prototypes in case we ever want to nest in the opposite order.
template< template<typename...> typename Container,
 typename F,
 typename... Ts >
 requires is_iterable<Container<Ts...>>
inline auto traverse ( const Container<Ts...>& input, const F& f );
template< template<class T, std::size_t> class Container,
 typename F,
 typename T,
 std::size_t N >
 requires is_iterable<Container<T, N>>
constexpr auto traverse( const Container<T, N>& input, const F& f );
template< typename T,
 typename F,
 std::size_t N >
constexpr auto traverse( const T(&input)[N], const F& f );
/* Base case for traversal: a single input to the transformation function.
 */
template< typename T,
 std::invocable<T> F >
constexpr auto traverse( const T& input, const F& f )
{
 return std::invoke( f, input );
}
/* This overload was needed to support traversing std::array.
 */
template< template<class T, std::size_t> class Container,
 typename F,
 typename T,
 std::size_t N >
 requires is_iterable<Container<T, N>>
constexpr auto traverse( const Container<T, N>& input, const F& f )
{
 using TransformedValueType = decltype(traverse(*std::begin(input), f));
 Container<TransformedValueType, N> output;
 std::transform( std::begin(input),
 std::end(input),
 std::begin(output),
 [f](auto &x){ return traverse( x, f ); }
 );
 return output;
}
/* Since C++ does not allow functions to return built-in arrays, we transform
 * them into std::array objects instead.
 */
template< typename T,
 typename F,
 std::size_t N >
constexpr auto traverse( const T(&input)[N], const F& f )
{
 using TransformedValueType = decltype(traverse(*std::begin(input), f));
 std::array< TransformedValueType, N > output;
 std::transform( std::begin(input),
 std::end(input),
 std::begin(output),
 [f](auto &x){ return traverse( x, f ); }
 );
 return output;
}
/* This implementation looks for the following member functions on the
 * transformed container, in order: .emplace_back() (std::vector,
 * std::dequeue, std::list), .emplace_after() (std::forward_list),
 * emplace_hint (std::set, etc.), or .emplace(). These will fail on
 * containers such as std::map, unless another overload enables traversal on
 * key-value pairs. If none of those succeeds, it will attempt to resize the
 * output container and call std::transform().
 */
template< template<typename...> typename Container,
 typename F,
 typename... Ts >
 requires is_iterable<Container<Ts...>>
inline auto traverse ( const Container<Ts...>& input, const F& f )
{
 using TransformedValueType = decltype(traverse(*std::begin(input), f));
 using OutputC = Container<TransformedValueType>;
 OutputC output;
 // Should reserve storage if and only if the container supported.
 if constexpr ( is_sized<Container<Ts...>> &&
 is_reservable<Container<TransformedValueType>> ) {
 output.reserve(input.size());
 }
 if constexpr (is_back_emplaceable<OutputC, TransformedValueType>) {
 for( const auto& x : input ) {
 output.emplace_back(traverse( x, f ));
 }
 } else if constexpr (is_forward_list_emplaceable<OutputC, TransformedValueType>) {
 // This branch is NOT TESTED.
 auto current = std::begin(output);
 for( auto it = std::begin(input); it != std::end(input); ++it ) {
 current = output.emplace_after( current, traverse( *it, f ) );
 }
 } else if constexpr (is_set_emplaceable<OutputC, TransformedValueType>) {
 // This branch is NOT TESTED.
 auto current = std::begin(output);
 for( auto it = std::begin(input); it != std::end(input); ++it ) {
 current = output.emplace_hint( current, traverse( *it, f ) );
 }
 } else if constexpr (is_basic_emplaceable<OutputC, TransformedValueType>) {
 // This branch is NOT TESTED.
 for ( const auto& x : input ) {
 output.emplace(traverse( x, f ));
 }
 } else {
 if constexpr ( is_sized<Container<Ts...>> &&
 is_resizeable<OutputC>) {
 output.resize(input.size());
 }
 std::transform( std::begin(input),
 std::end(input),
 std::begin(output),
 [f](const auto& x){ return traverse( x, f ); }
 );
 }
 // One last sanity check.
 if constexpr( is_sized<Container<Ts...>> && is_sized<OutputC> ) {
 assert( output.size() == input.size() );
 }
 return output;
}
auto negative_points(const point_matrix &points) {
 return traverse( points, std::negate<double>{} );
}
int main() {
 const point_matrix points{
 {0, 0, 4},
 {0, 5, 3},
 {1, 7, 0},
 {2, 1, 4},
 {3, 4, 5},
 {4, 2, 3},
 {4, 4, 6},
 {4, 6, 7},
 {5, 0, 2},
 {6, 4, 1},
 {6, 5, 1},
 {6, 7, 0},
 {7, 4, 3}
 };
 fmt::print("Original: {}\n", points);
 fmt::print("Negative: {}\n", negative_points(points));
 return EXIT_SUCCESS;
}

In the code generated by Clang 15.0.0 with -std=c++20 -march=x86-64-v3 -O3, there is one call to memmove that might be optimized away, compared to just duck-typing your original code to the new types. But it’s still reasonably efficient, very general (and sadly overcomplicated).

I don't know if you'd call JimmyHu's recursive_transform() overengineered, but it's less lines of code than your two traverse() overloads. You just unrolled the recursion :) Also, I've shown that it is certainly possible to deduce the types of nested containers, and it isn't even that hard (at least not once you know how template template parameters work).
Very nice! I had not realized that the recursive_transform() implementation wouldn't work on std::arrays, or at least it does "work" but it then treats the whole array as a value instead of a container. Adding support for reserve() is great as well.
Weird, let's hope the future will indeed be more optimal. I also checked and it should be possible to copy all but the first template parameter of the input container over to the new one, so if you have a container with custom allocators, those will be used as well for the result. But maybe this should become a new question on Code Review.
@G.Sliepen Actually, it was because the original implementation used .push_back(), which is expensive on std::array (and fails on many outer STL containers). I just posted a new version that emplaces instead, optimizing much better here. It might—although I have not tested—also work for many other containers that did not before, including std::forward_list and std::set. It still won’t work for std::map.
@G.Sliepen I think at this point I might have failed my goal of not making this "ridiculously over-engineered."

score 2 · Answer 4 · 2023-02-05 15:16:37Z

As @J_H said you need to time both executions and you need to examine the generated assembly code if you are worrying about performance. You can do this in the same program.

The reason to do the bench marks and examine the resultant assembly code is that while both versions may generate the same code if compiled -O3 they probably don't generate the same code if they are compiled using other optimization levels or completely un-optimized. There are times where you can't use optimization or the best optimization and you still want the code that performs the best (embedded programming is one place where you may not be able to optimize the code).

Rather than visually examine the output, provide expected output in the test program and compare that output within the program. You can then use a test framework to indicate pass or fail for each of the methods.

Some programmers would consider the first method too C like, the second method is more idiomatic for C++ although I do agree with you about readability.

score 2 · Answer 5 · 2023-02-07 09:36:28Z

Most of my observations apply to both versions of the code:

Use a proper "Point" type instead of std::vector<double>. And make reflecting a point be a named function.
Does it make sense to have a collection of points of differing dimensions? Judging by the first version, it doesn't, so we could have a function templated on the number of dimensions.
Consider passing the input by value, then modifying that value in-place.
main() isn't using the arguments, so use the no-args signature.
Reaching the end of main() is equivalent to returning EXIT_SUCCESS, so that can be omitted.

Specific to first version:

Instead of a preprocessor macro, use a proper typed constant for D:
```
 static constexpr std::size_t D = 3;
```

Specific to second version:

We need to #include <algorithm> for std::transform() to be declared.
Consider using the Ranges version of the algorithm for simpler code.
We can use std::negate (from <functional>) instead of writing the transformation [](double x) { return -x; }.

Suggested code:

#include <algorithm>
#include <functional>
#include <vector>
using Point3d = std::vector<double>;
Point3d reflected(Point3d p)
{
 std::ranges::transform(p, p.begin(), std::negate<>{});
 return p;
}
std::vector<Point3d> negative_points(std::vector<Point3d> points)
{
 std::ranges::transform(points, points.begin(), reflected);
 return points;
}

#define FMT_HEADER_ONLY
#include <fmt/ranges.h>
int main()
{
 std::vector<Point3d> points{
 {{0, 0, 4}},
 {{0, 5, 3}},
 {{1, 7, 0}},
 {{2, 1, 4}},
 {{3, 4, 5}},
 {{4, 2, 3}},
 {{4, 4, 6}},
 {{4, 6, -7}},
 {{-5, 0, -2}},
 {{-6, -4, -1}},
 {{6, 5, 1}},
 {{6, 7, 0}},
 {{7, 4, 3}},
 };
 fmt::print("Original: {}\n", points);
 fmt::print("Negative: {}\n", negative_points(points));
}

With this code, we can easily change to using a different Point representation, with no changes anywhere else:

#include <array>
using Point3d = std::array<double,3>;

selbie selbieselbie 1955 bronze badges · Answer 6 · 2023-02-05 21:51:38Z

I don't understand the hardcoded number of dimensions (ala. the #define D 3 )

Consider just resizing each row of negative inline.

std::vector<std::vector<double>> negative_points(const std::vector<std::vector<double>> &points) {
 std::vector<std::vector<double>> negative(points.size());
 for (std::size_t i = 0; i < points.size(); ++i) {
 negative[i].resize(points[i].size());
 for (std::size_t j = 0; j < points[i].size(); ++j) {
 negative[i][j] = -points[i][j];
 }
 }
 return negative;
}

There will of course be folks that say, "just use std::transform". But I also agree with that this sort of terse code writing can be very difficult to read and maintain. The performance delta of using a pair of nested for-loops is probably negligible when compiler optimizations are applied.

Stack Exchange Network

Compute "negative" version of vector of vectors

6 Answers 6

Make it more generic

Maybe you want to use a library for vector types

Include the Required Library Headers

Use `constexpr`, not Macros

Use Efficient Data Structures

Consider a More Flexible Approach

Putting it All Together

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Compute "negative" version of vector of vectors

6 Answers 6

Make it more generic

Maybe you want to use a library for vector types

Include the Required Library Headers

Use constexpr, not Macros

Use Efficient Data Structures

Consider a More Flexible Approach

Putting it All Together

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

Use `constexpr`, not Macros