4

I was thinking about the utility of non-standard __restrict keyword in C and C++ and how its effect can be emulated by carefully declare (disjoint) value objects .

Restrict is usually explained through indirect memory access that might or might not overlap memory addresses:

int foo(Pointer1 a, Pointer2 b) { // adding non-standard `restrict` keyword might hint that dreaded_function_call is never called
 *a = 5;
 *b = 6;
 if(*a == *b) dreaded_function_call(); // in isolation this function may or may not be called
}

If the compiler can prove that a and b do not overlap then the dreaded_function_call() is never called or referred in the compilation.

This is exactly what I achieve in this example, with GCC at least, dreaded_function_call doesn't even appear in the generated machine code.

#include<vector>
template<class It> void modify_v(It);
void dreaded_function_call();
template<class It1, class It2>
void foo(It1 a, It2 b) {
 *a = 5;
 *b = 6;
 if(*a == *b) dreaded_function_call();
}
int main() {
 std::vector<int> v1(10); modify_v(v1.begin());
 std::vector<int> v2(10); modify_v(v2.begin());
 foo(v1.begin() + 5, v2.begin() + 5);
}

However, if I slightly change the code to generate the vector themselves from separate function calls, I loose this optimization. I can observe this since the generated code will branch and still consider the possibly of calling dreaded_function_call.

std::vector<int> make_v();
...
int main() {
 std::vector<int> v1 = make_v();
 std::vector<int> v2 = make_v();
 foo(v1.begin() + 5, v2.begin() + 5);
}

What is going on here? make_v returns by copy, so v1 and v2 should be disjoint just like in the case above. Yet the compiler doesn't do the same optimization.

Is this just missed opportunity for optimization, or this optimization would be just outright invalid?

Here is the compiled code that illustrates both cases with GCC: https://godbolt.org/z/WKKzs1edc

(Clang gives the same behavior.)


EDIT: As one of the comments points out, the culprit might be the fact that the compiler cannot see how the vectors are allocated since make_v is hidden from view, even if make_v returns a copy. (Also copy elision could be interfering here?)

In this case, the following code could be more clear with respect to whether the copy is visible or not,

std::vector<int> make_v();
...
int main() {
 std::vector<int> v0 = make_v();
 std::vector<int> v1 = v0;
 std::vector<int> v2 = v0;
 foo(v1.begin() + 5, v2.begin() + 5);
}

Indeed, in this case GCC doesn't still optimize (doesn't effectively restricts pointer) but Clang does. As illustrated here, https://godbolt.org/z/4xhq1975x

So there is a combination of factors here, the last factor being the compiler itself. But in my opinion, it seems that return-by-copy and the analysis of the copy constructor should be enough.

If this is a consequence of return by move or RVO, then I would say that this is a hidden cost of these features. Just my opinion.

asked Oct 20, 2024 at 20:45
4
  • 'make_v returns by copy' -- that depends on the implementation of make_v, which the compiler can't see here. The optimization you're expecting is possible only if the compiler can analyze all constructors of vector and prove that each constructor allocates unique memory block... Commented Oct 20, 2024 at 21:01
  • @IgorG, the signature of make_v says it returns by copy, what other option does it have. I understand that the compiler cannot track how the vector inside the function is allocated but then it is returned by copy. Seems that only the copy constructor needs to be analyzed. Having said that, it looks like the problem is the copy constructor, because such copy forfeits the optimization as well: godbolt.org/z/qrv3v15Gs Commented Oct 20, 2024 at 21:03
  • Starting with C++17 there's mandatory copy elision. The value can be moved or constructed directly in v1 and v2. Replace std::vector with another class that has its copy and move constructors deleted, and it will still work. In C++17, that is. Commented Oct 20, 2024 at 21:11
  • @IgorG, yeah, that is my suspicion, if that is the case, it would be a pity if RVO is hindering this optimization. But I don't think this is the case, because even when the compiler can see the copy it refuses to make the optimization. Commented Oct 20, 2024 at 21:13

1 Answer 1

1

Each std::vector<int> holds a pointer that points to the actual memory region holding the elements.

We know from the library specification in the standard, that it is impossible for two std::vector<int> objects to refer to the same element objects or the same storage.

However, from the actual core language point-of-view, there is nothing preventing you from writing a make_v function that always returns std::vector<int> objects that have their internal pointer initialized to the same value. In particular, even if there is no public member function to achieve this, it is always possible to use tricks to access private data members directly. Class invariants aren't actually usable by the optimizer, even if it was clever enough to understand them.

Of course, any way that you could attempt to do this would have undefined behavior under the library rules, but the compilers (with some smaller exceptions) do not know or take into account the library specification and instead work with the standard library code just like any other code.

In your first example the compiler can see through inlining to the actual operator new calls used to allocate memory for the elements. And two calls to operator new must always produce pointers to disjoint memory without intervening operator delete call to the first pointer value. That's knowledge that is built-in to the compiler's optimizer.

No pointer or reference that could be used to access the std::vector<int> objects is given to modify_v and no such pointers escape the function, so the compiler also knows that modify_v can't modify where the vector objects' internal pointers point, nor can modify_v call operator delete on the pointer.

answered Oct 20, 2024 at 21:06
Sign up to request clarification or add additional context in comments.

10 Comments

Yes, I though about this too, but how about this, godbolt.org/z/bcz3G6YGe , in this case it can see through the actual operator new that makes the copy, and still it doesn't do the optimization.
@alfC That seems to be a missed optimization opportunity somewhere in GCC. Clang does optimize the call away: godbolt.org/z/d5v6xq7xM
@ser17732522, excellent, so the conclusion is that 1) the compiler seems to need to see the copy explicitly, returning by value might not be enough, and 2) GCC still can miss it. As illustrated here: godbolt.org/z/4xhq1975x
@alfC Yes, without a copy (not move) from the return values, there is no reasonable way for a compiler to know that the vectors point to different sequences of elements. And sure, the more complex the example becomes, the less likely it will be that the compiler will be able to find the best optimization result.
@alfC You are correct, I can't get it to work with [[assume]] or [[unreachable]] in either GCC or Clang. I don't really understand why though. This shouldn't be too hard for the compilers to use during optimization. From a formal perspective: Of course you are not allowed to compare iterators into different containers. However something like &*a != &*b should be valid as test and would guarantee that dreaded_function_call can't be called without the program having UB. A simply if testing this also doesn't prevent the call from optimized away. Seems like a missed optimization.
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.