Given a std::vector<std::vector<double>>
(e.g., representing some points in a space with D
dimensions), I want to compute a new std::vector<std::vector<double>>
where each double
is the negative of the original, as in the following MWE:
#define FMT_HEADER_ONLY
#include <fmt/ranges.h>
#include <vector>
#define D 3
std::vector<std::vector<double>> negative_points(const std::vector<std::vector<double>> &points) {
std::vector<std::vector<double>> negative(points.size(), std::vector<double>(D));
for (std::size_t i = 0; i < points.size(); ++i) {
for (std::size_t j = 0; j < D; ++j) {
negative[i][j] = -points[i][j];
}
}
return negative;
}
int main(int argc, char *argv[]) {
std::vector<std::vector<double>> points{
{0, 0, 4},
{0, 5, 3},
{1, 7, 0},
{2, 1, 4},
{3, 4, 5},
{4, 2, 3},
{4, 4, 6},
{4, 6, 7},
{5, 0, 2},
{6, 4, 1},
{6, 5, 1},
{6, 7, 0},
{7, 4, 3}
};
fmt::print("Original: {}\n", points);
fmt::print("Negative: {}\n", negative_points(points));
return EXIT_SUCCESS;
}
Here I obtain the result with index-based loops, is there a better way to do this? Here's an attempt with std
functions, which I nonetheless find more difficult to understand:
std::vector<std::vector<double>> negative_points_std(const std::vector<std::vector<double>> &points) {
std::vector<std::vector<double>> negative(points.size());
std::transform(points.begin(), points.end(), negative.begin(),
[](const std::vector<double> &point) {
std::vector<double> neg_point(point.size());
std::transform(point.begin(), point.end(), neg_point.begin(),
[](double x) { return -x; });
return neg_point;
});
return negative;
}
6 Answers 6
Make it more generic
This works specifically for two-dimensional nested std::vector
s of double
s, and will only negate the elements. What if you have a three-dimensional container, and want to take the square root, and want the results to be in a nested std::vector
of std::complex<double>
so you can handle negative input values?
Ideally, you want something like std::transform()
that automatically recurses nested containers, and that returns a copy instead of modifying in-place. This can be done, and user JimmyHu explored this in great detail. See for example his recursive_transform()
implementation that works on nested std::vector
s. In my answer there, I showed how to make it even more generic and work on any type of container.
With a recursive transform function like that, you would be able to write:
int main() {
std::vector<std::vector<double>> points{
{0, 0, 4},
{0, 5, 3},
{1, 7, 0},
...
};
fmt::print("Original: {}\n", points);
fmt::print("Negative: {}\n", recursive_transform(points, std::negate{}));
return EXIT_SUCCESS;
}
Maybe you want to use a library for vector types
If you specifically want to transform 3D coordinates, then instead of using a std::vector<double>
to represent a point, you might want to use another type. For example, you could use GLM, and write:
int main() {
std::vector<glm::dvec3> points{
{0, 0, 4},
{0, 5, 3},
{1, 7, 0},
...
};
auto negated = points;
std::transform(negated.begin(), negated.end(), std::negate{});
fmt::print("Original: {}\n", points);
fmt::print("Negative: {}\n", negated);
return EXIT_SUCCESS;
}
This will be much more efficient; a glm::dvec3
is like a struct
holding 3 double
s, so all the points will be stored compactly in memory, whereas with nested std::vector
s, each inner std::vector
will do its own memory allocation, with no guarantee that they will all be sequential in memory, not to mention the overhead of doing the allocation and keeping track of it for every point.
They both mostly look fine. LGTM, ship!
You didn't post timing figures for the two versions.
The index-based version suffers from this weakness:
#define D 3
The big advantage of the second version
std::transform(point.begin(), point.end(), neg_point.begin(), ...
is it asks "how many dimensions does a point have?" at run time. Surely the first version could learn this trick.
C programmers expect that caller might allocate and be responsible for object lifetime. Following C++'s somewhat different conventions here makes perfect sense.
If there was the ability for caller to specify the pre-allocated destination vector, then we might examine whether passing in same vector twice results in correct overwriting behavior.
The memory access pattern is same as memcpy. Both versions are friendly to caches, branch prediction, loop unrolling, and vectorization.
-
\$\begingroup\$ I'm not sure I get why specifying the number of dimensions at compile time is a "weakness"? Is it because the code is less "flexible"? Shouldn't this (potentially) enable more compile-time optimizations since the size is known? \$\endgroup\$Filippo Bistaffa– Filippo Bistaffa2023年02月05日 13:49:12 +00:00Commented Feb 5, 2023 at 13:49
-
1\$\begingroup\$ Right, flexibility, it prohibits a mixture of 2-D and 3-D vectors, and burdens the caller with getting it properly defined at compile time. But yeah, the optimization aspect could be very attractive. Which brings us back to publishing head-to-head benchmark timings, looking at generated code on godbolt.org , that sort of thing. \$\endgroup\$J_H– J_H2023年02月05日 13:59:26 +00:00Commented Feb 5, 2023 at 13:59
-
\$\begingroup\$ "The memory access pattern is same as memcpy." Within one inner
std::vector
, sure, but not for thestd::vector<std::vector<>>
as a whole. If you only store three values per point, it will be quite bad. \$\endgroup\$G. Sliepen– G. Sliepen2023年02月05日 22:36:02 +00:00Commented Feb 5, 2023 at 22:36 -
\$\begingroup\$ Ok, let's lean on that item. What I had in mind is the L3 layer will issue sequential cache-line requests to DRAM, same as memcpy, so DRAM stays happy and fast. And if Point is cache-line aligned there will be padding holes, fine, we lose a fraction of the bandwidth. But if you see another pattern, please describe it! I'm especially interested in description of "quite bad", and how to mitigate it. Also, I think we're coming back to the old business of OP may want to publish benchmark timings for different values of D, including odd values. \$\endgroup\$J_H– J_H2023年02月05日 22:53:42 +00:00Commented Feb 5, 2023 at 22:53
-
\$\begingroup\$ Just consider that a
std::vector<std::vector<double>>
is a pointer to an array of pointers to arrays of doubles. Each innerstd::vector
has separately allocated its array of doubles. There is no guarantee those are sequential in memory. Eachstd::vector
also stores a size and capacity, the memory allocator also needs to store some metadata for each allocation, so the size overhead for each innerstd::vector
of 3double
s is quite significant, more than a factor 2! Only if you are lucky and everything is sequential in memory can things be prefetched. \$\endgroup\$G. Sliepen– G. Sliepen2023年02月06日 08:22:10 +00:00Commented Feb 6, 2023 at 8:22
Include the Required Library Headers
You use both std::size_t
and EXIT_SUCCESS
in your program, so you should include <cstdlib>
. These identifiers might also be declared as a side-effect of another header file, but this is not portable.
Use constexpr
, not Macros
More precisely, use constexpr
when you can, and macros when you have to. Preprocessor macros only do lexical substitution, which can have many subtle errors and be extremely hard to debug. A constexpr
expression (or something less restrictive, like consteval
or inline
) gets you something type-safe and scoped that follows the syntax of the language.
Use Efficient Data Structures
You have a rectangular array here: every row is the same constant width. A vector
of vector
is an extremely inefficient data structure to represent this. In fact, it’s almost never the data structure you really want.
There are some other recommendations in other answers, but one very quick solution that duck-types perfectly to the code you’ve already written is:
static constexpr std::size_t D = 3;
using row=std::array< double, D >;
using point_matrix = std::vector<row>;
(If you genuinely do have a sparse matrix where being able to omit trailing empty columns matters, instead of fixed-width columns, the format you want is usually something like compressed sparse row.)
Let’s take a look at the generated assembly output for one of the inner loops, from Clang 15.0.0 with -std=c++20 -march=x86-64-v3 -O3
when we use a std::vector
of fixed-size arrays:
.LBB0_25: # =>This Inner Loop Header: Depth=1
vxorps ymm1, ymm0, ymmword ptr [rax + rdi]
vxorps ymm2, ymm0, ymmword ptr [rax + rdi + 32]
vxorps ymm3, ymm0, ymmword ptr [rax + rdi + 64]
vmovups ymmword ptr [rdx + rdi + 64], ymm3
vmovups ymmword ptr [rdx + rdi + 32], ymm2
vmovups ymmword ptr [rdx + rdi], ymm1
add rdi, 96
cmp r8, rdi
jne .LBB0_25
With dynamic vectors as rows:
- Each row must allocate its own dynamic memory.
- The
negative
object must initialize every row, move and delete them. - The
negative_points
function must dereference every row and loop over it individually, with a cache miss each time. - All lookups require double indirection.
With fixed-width arrays as rows:
- Allocating a new object requires a single call to the
new
handler, and with copy elision,delete
can be optimized away. - The loop can be a single linear pass through the contiguous storage, with every row but the first preloaded.
- Accessing the storage requires a single dereference.
If the rows were larger, there would theoretically be one advantage to making the rows vectors
: they would then be moveable and swappable. With rows this small, the overhead of copying one is no greater than of moving a vector
.
Consider a More Flexible Approach
Since people here have suggested at least three different data structures, there’s a good chance you might want to change the data structure you use. I picked one that duck-typed to your existing code, but some of the other suggestions don’t.
In the abstract, what you want to do has a few different names in computer science, and one from functional programming is traverse
. That is, you want to take some kind of structure (or, more abstractly, a functor from category theory) that wraps elements of a type, here double
, pass it a function that takes that type as its input, and return a new structure with the same layout, but where all the data has been passed through the function.
Unfortunately, deducing the correct template parameters for arbitrarily-nested containers would be somewhere between very complicated and impossible, but you can at least generalize on the transformation function and overload the traversal. This lets you write:
inline point_matrix negative_points(const point_matrix &points) {
return traverse( points, std::negate<double>{} );
}
Putting it All Together
Here’s a version, based on this code by G. Sliepen, that should work for nested containers. It replaces my original implementation.
#include <algorithm> // transform
#include <array>
#include <cassert>
#include <concepts> // std::invocable
#include <cstdlib>
#include <fmt/ranges.h>
#include <functional> // invoke, negate
#include <iterator> // begin, end
#include <vector>
static constexpr std::size_t D = 3;
using row_t = std::array< double, D >;
using point_matrix = std::vector<row_t>;
template<class T>
concept is_iterable = requires(T x)
{
*std::begin(x);
std::end(x);
};
template<class T>
concept is_sized = requires(T x)
{
x.size();
};
template<class T>
concept is_reservable = requires( T x, std::size_t n )
{
x.reserve(n);
};
template<class T>
concept is_resizeable = requires( T x, std::size_t n )
{
x.resize(n);
};
template<class T, class U>
concept is_back_emplaceable = requires( T x, U y )
{
x.emplace_back(y);
};
template<class T, class U>
concept is_set_emplaceable = requires( T x, U y )
{
x.emplace_hint( std::begin(x), y );
};
template<class T, class U>
concept is_forward_list_emplaceable = requires( T x, U y )
{
x.emplace_after( std::begin(x), y );
};
template<class T, class U>
concept is_basic_emplaceable = requires( T x, U y )
{
x.emplace(y);
};
// Prototypes in case we ever want to nest in the opposite order.
template< template<typename...> typename Container,
typename F,
typename... Ts >
requires is_iterable<Container<Ts...>>
inline auto traverse ( const Container<Ts...>& input, const F& f );
template< template<class T, std::size_t> class Container,
typename F,
typename T,
std::size_t N >
requires is_iterable<Container<T, N>>
constexpr auto traverse( const Container<T, N>& input, const F& f );
template< typename T,
typename F,
std::size_t N >
constexpr auto traverse( const T(&input)[N], const F& f );
/* Base case for traversal: a single input to the transformation function.
*/
template< typename T,
std::invocable<T> F >
constexpr auto traverse( const T& input, const F& f )
{
return std::invoke( f, input );
}
/* This overload was needed to support traversing std::array.
*/
template< template<class T, std::size_t> class Container,
typename F,
typename T,
std::size_t N >
requires is_iterable<Container<T, N>>
constexpr auto traverse( const Container<T, N>& input, const F& f )
{
using TransformedValueType = decltype(traverse(*std::begin(input), f));
Container<TransformedValueType, N> output;
std::transform( std::begin(input),
std::end(input),
std::begin(output),
[f](auto &x){ return traverse( x, f ); }
);
return output;
}
/* Since C++ does not allow functions to return built-in arrays, we transform
* them into std::array objects instead.
*/
template< typename T,
typename F,
std::size_t N >
constexpr auto traverse( const T(&input)[N], const F& f )
{
using TransformedValueType = decltype(traverse(*std::begin(input), f));
std::array< TransformedValueType, N > output;
std::transform( std::begin(input),
std::end(input),
std::begin(output),
[f](auto &x){ return traverse( x, f ); }
);
return output;
}
/* This implementation looks for the following member functions on the
* transformed container, in order: .emplace_back() (std::vector,
* std::dequeue, std::list), .emplace_after() (std::forward_list),
* emplace_hint (std::set, etc.), or .emplace(). These will fail on
* containers such as std::map, unless another overload enables traversal on
* key-value pairs. If none of those succeeds, it will attempt to resize the
* output container and call std::transform().
*/
template< template<typename...> typename Container,
typename F,
typename... Ts >
requires is_iterable<Container<Ts...>>
inline auto traverse ( const Container<Ts...>& input, const F& f )
{
using TransformedValueType = decltype(traverse(*std::begin(input), f));
using OutputC = Container<TransformedValueType>;
OutputC output;
// Should reserve storage if and only if the container supported.
if constexpr ( is_sized<Container<Ts...>> &&
is_reservable<Container<TransformedValueType>> ) {
output.reserve(input.size());
}
if constexpr (is_back_emplaceable<OutputC, TransformedValueType>) {
for( const auto& x : input ) {
output.emplace_back(traverse( x, f ));
}
} else if constexpr (is_forward_list_emplaceable<OutputC, TransformedValueType>) {
// This branch is NOT TESTED.
auto current = std::begin(output);
for( auto it = std::begin(input); it != std::end(input); ++it ) {
current = output.emplace_after( current, traverse( *it, f ) );
}
} else if constexpr (is_set_emplaceable<OutputC, TransformedValueType>) {
// This branch is NOT TESTED.
auto current = std::begin(output);
for( auto it = std::begin(input); it != std::end(input); ++it ) {
current = output.emplace_hint( current, traverse( *it, f ) );
}
} else if constexpr (is_basic_emplaceable<OutputC, TransformedValueType>) {
// This branch is NOT TESTED.
for ( const auto& x : input ) {
output.emplace(traverse( x, f ));
}
} else {
if constexpr ( is_sized<Container<Ts...>> &&
is_resizeable<OutputC>) {
output.resize(input.size());
}
std::transform( std::begin(input),
std::end(input),
std::begin(output),
[f](const auto& x){ return traverse( x, f ); }
);
}
// One last sanity check.
if constexpr( is_sized<Container<Ts...>> && is_sized<OutputC> ) {
assert( output.size() == input.size() );
}
return output;
}
auto negative_points(const point_matrix &points) {
return traverse( points, std::negate<double>{} );
}
int main() {
const point_matrix points{
{0, 0, 4},
{0, 5, 3},
{1, 7, 0},
{2, 1, 4},
{3, 4, 5},
{4, 2, 3},
{4, 4, 6},
{4, 6, 7},
{5, 0, 2},
{6, 4, 1},
{6, 5, 1},
{6, 7, 0},
{7, 4, 3}
};
fmt::print("Original: {}\n", points);
fmt::print("Negative: {}\n", negative_points(points));
return EXIT_SUCCESS;
}
In the code generated by Clang 15.0.0 with -std=c++20 -march=x86-64-v3 -O3
, there is one call to memmove
that might be optimized away, compared to just duck-typing your original code to the new types. But it’s still reasonably efficient, very general (and sadly overcomplicated).
-
1\$\begingroup\$ I don't know if you'd call JimmyHu's
recursive_transform()
overengineered, but it's less lines of code than your twotraverse()
overloads. You just unrolled the recursion :) Also, I've shown that it is certainly possible to deduce the types of nested containers, and it isn't even that hard (at least not once you know how template template parameters work). \$\endgroup\$G. Sliepen– G. Sliepen2023年02月08日 13:28:08 +00:00Commented Feb 8, 2023 at 13:28 -
1\$\begingroup\$ Very nice! I had not realized that the
recursive_transform()
implementation wouldn't work onstd::array
s, or at least it does "work" but it then treats the whole array as a value instead of a container. Adding support forreserve()
is great as well. \$\endgroup\$G. Sliepen– G. Sliepen2023年02月08日 14:24:52 +00:00Commented Feb 8, 2023 at 14:24 -
1\$\begingroup\$ Weird, let's hope the future will indeed be more optimal. I also checked and it should be possible to copy all but the first template parameter of the input container over to the new one, so if you have a container with custom allocators, those will be used as well for the result. But maybe this should become a new question on Code Review. \$\endgroup\$G. Sliepen– G. Sliepen2023年02月08日 15:10:03 +00:00Commented Feb 8, 2023 at 15:10
-
1\$\begingroup\$ @G.Sliepen Actually, it was because the original implementation used
.push_back()
, which is expensive onstd::array
(and fails on many outer STL containers). I just posted a new version that emplaces instead, optimizing much better here. It might—although I have not tested—also work for many other containers that did not before, includingstd::forward_list
andstd::set
. It still won’t work forstd::map
. \$\endgroup\$Davislor– Davislor2023年02月08日 16:20:35 +00:00Commented Feb 8, 2023 at 16:20 -
1\$\begingroup\$ @G.Sliepen I think at this point I might have failed my goal of not making this "ridiculously over-engineered." \$\endgroup\$Davislor– Davislor2023年02月08日 16:28:57 +00:00Commented Feb 8, 2023 at 16:28
As @J_H said you need to time both executions and you need to examine the generated assembly code if you are worrying about performance. You can do this in the same program.
The reason to do the bench marks and examine the resultant assembly code is that while both versions may generate the same code if compiled -O3 they probably don't generate the same code if they are compiled using other optimization levels or completely un-optimized. There are times where you can't use optimization or the best optimization and you still want the code that performs the best (embedded programming is one place where you may not be able to optimize the code).
Rather than visually examine the output, provide expected output in the test program and compare that output within the program. You can then use a test framework to indicate pass or fail for each of the methods.
Some programmers would consider the first method too C
like, the second method is more idiomatic for C++ although I do agree with you about readability.
Most of my observations apply to both versions of the code:
Use a proper "Point" type instead of
std::vector<double>
. And make reflecting a point be a named function.Does it make sense to have a collection of points of differing dimensions? Judging by the first version, it doesn't, so we could have a function templated on the number of dimensions.
Consider passing the input by value, then modifying that value in-place.
main()
isn't using the arguments, so use the no-args signature.Reaching the end of
main()
is equivalent to returningEXIT_SUCCESS
, so that can be omitted.
Specific to first version:
Instead of a preprocessor macro, use a proper typed constant for
D
:static constexpr std::size_t D = 3;
Specific to second version:
- We need to
#include <algorithm>
forstd::transform()
to be declared. - Consider using the Ranges version of the algorithm for simpler code.
- We can use
std::negate
(from<functional>
) instead of writing the transformation[](double x) { return -x; }
.
Suggested code:
#include <algorithm>
#include <functional>
#include <vector>
using Point3d = std::vector<double>;
Point3d reflected(Point3d p)
{
std::ranges::transform(p, p.begin(), std::negate<>{});
return p;
}
std::vector<Point3d> negative_points(std::vector<Point3d> points)
{
std::ranges::transform(points, points.begin(), reflected);
return points;
}
#define FMT_HEADER_ONLY
#include <fmt/ranges.h>
int main()
{
std::vector<Point3d> points{
{{0, 0, 4}},
{{0, 5, 3}},
{{1, 7, 0}},
{{2, 1, 4}},
{{3, 4, 5}},
{{4, 2, 3}},
{{4, 4, 6}},
{{4, 6, -7}},
{{-5, 0, -2}},
{{-6, -4, -1}},
{{6, 5, 1}},
{{6, 7, 0}},
{{7, 4, 3}},
};
fmt::print("Original: {}\n", points);
fmt::print("Negative: {}\n", negative_points(points));
}
With this code, we can easily change to using a different Point representation, with no changes anywhere else:
#include <array>
using Point3d = std::array<double,3>;
I don't understand the hardcoded number of dimensions (ala. the #define D 3
)
Consider just resizing each row of negative inline.
std::vector<std::vector<double>> negative_points(const std::vector<std::vector<double>> &points) {
std::vector<std::vector<double>> negative(points.size());
for (std::size_t i = 0; i < points.size(); ++i) {
negative[i].resize(points[i].size());
for (std::size_t j = 0; j < points[i].size(); ++j) {
negative[i][j] = -points[i][j];
}
}
return negative;
}
There will of course be folks that say, "just use std::transform". But I also agree with that this sort of terse code writing can be very difficult to read and maintain. The performance delta of using a pair of nested for-loops is probably negligible when compiler optimizations are applied.