My colleague came across this situation where global __device__ constexpr variables are accessible from both the host and the device.
#include <array>
#include <cstdio>
__device__ constexpr std::array<int, 2> a = {15, 1};
__host__ __device__ void my_print(const char* c_str, const int& i) {
std::printf(c_str);
std::printf(" i=%d &i=%p\n", i, &i);
}
__global__ void display() {
my_print("device", a[0]);
}
int main() {
display<<<1, 1>>>();
cudaDeviceSynchronize();
my_print("host", a[0]);
return 0;
}
Here is the output from a run, which looks reasonable:
device i=15 &i=0x7e97e9200000
host i=15 &i=0x482008
Our situation is related to this question. However, unlike that case, we know the variable's value at compile time. Is the current situation well-defined?
Update: Thanks for the discussion in the comments. We understand that passing a constexpr device variable as const ref to a host should be treated as a bug by the CUDA compiler and it has been reported. We also understand that a global constexpr host variable is enough as a variable that should be accessed by both host and device. It is somewhat unfortunate that constexpr host variables cannot be passed as const ref to the device. Most likely nothing could go wrong.
1 Answer 1
Indeed, as @Johan says in a comment, this is a bug.
The bug has two aspects, actually:
A
__device__-side global variable exists on every device on which the code with that variable is used (or more exactly, in every context into which the variable's module is loaded; see §18.2 of the Programming Guide). In other words - when a device runs your code,ais well-defined; but on the host side - there may be manya's, so - whicha's address could you take, at all?Even supposing there were just one device (and one context), and the choice of
aon the host was somehow made reasonably (e.g. the primary context on the current device) - you should still not be getting different addresses: CUDA, since version 6.0, uses a single unified memory space for all addresses on GPUs and on the host. That means that the address of an entity (like a variable or function) is the same for device-side code and host-side code - regardless of whether it's visible from both host and device.
I tried try your code with a non-constexpr variable, and of a simpler type (an int x); but I changed it to print the value on all of my (two) CUDA-capable GPUs. The resulting output is:
device 0: int x at address 0x7f826ba00000 has value 123
device 1: int x at address 0x7f826b000000 has value 123
host: int x at address 0x560032e54568 has value 0
host: int y at address 0x560032e544b0 has value 456
so, two variables, on two devices; and yet we get a third address; and it looks very much like the address of another global host-side variable that I also defined.
Here's the program I used:
#include <cstdio>
__device__ int x = 123;
int y = 456;
__host__ __device__ void display_x(int const* dev_idx, int const& x) {
if (dev_idx) { std::printf("device %d: ", *dev_idx); }
else { std::printf("host: "); }
std::printf("int x at address %p has value %d\n", &x, x);
}
// Type your code here, or load an example.
__global__ void display_x_kernel(int dev_idx) {
display_x(&dev_idx, x);
}
int main() {
int num_devices = 0;
cudaGetDeviceCount(&num_devices);
for(int dev_idx = 0; dev_idx < num_devices; dev_idx++) {
cudaSetDevice(dev_idx);
display_x_kernel<<<1, 1>>>(dev_idx);
cudaDeviceSynchronize();
}
display_x(nullptr, x);
std::printf("host: int y at address %p has value %d\n", &y, y);
}
I'll also take this opportunity to suggest that you avoid __device__ global variables altogether - except for rare case I can't think of right now off the top of my head. (Non-constant) globals are to be avoided generally, and device globals are kind of the lame brothers of host-side-code globals - considering that your program will have to copy the values of these globals, at run-time, to the GPU, anyway. And if you want constexpr values which "disappear", due to only being used at compile-time - well, I would suggest making them local to a consteval function. Maybe it won't make a difference, but frankly - I don't want to have to prove to myself that NVCC or NVRTC or clang will actually get rid of them.
Thanks goes to @paleonix for setting me straight regarding the most salient aspect of this bug.
5 Comments
Explore related questions
See similar questions with these tags.
__device__. When I reverse the scenario I get a compilation failure, which certainly gives me pause. (If we change theiparameter from pass-by-reference to pass-by-value, we can avoid that compilation failure - again - this is documented.)iinmy_print(which you can't because it'sconstexprandconst), there's no need formy_printto haveconst int& i.int iwill be faster because it's a simple scalar (and work, apparently).__device__variables leaking into the__host__code looks like a bug to me, the host should not see device code and any use of device methods/vars should be a compilation error. Who would you make it undefined behavior? Is the C++ community not trying to get rid of undefined behavior?constexprand other ways to define compile time constants is thatconstexpr"variables" do have an address and that creating a reference is taking that address (which can point to memory that is not accessible from either the host or the device). Ideally the compiler will inline the function and directly use the value of the constexpr variable. But if it can't do that, something will go wrong.