-
Notifications
You must be signed in to change notification settings - Fork 813
[sycl_ext_oneapi_clock] implement NVPTX case#21280
[sycl_ext_oneapi_clock] implement NVPTX case #21280tdavidcl wants to merge 4 commits intointel:sycl from
Conversation
tdavidcl
commented
Feb 12, 2026
Also I just found out that there is this file in LLVM libc/src/__support/GPU/utils.h which does define
uint64_t processor_clock() { return __builtin_readcyclecounter(); }
which is used in all test apparently.
We could maybe use that for both Nvidia and AMD since that's what is called within the CI.
zjin-lcf
commented
Feb 12, 2026
Thank you. I found some post ROCm/ROCm#1288 that may be related to your comments.
Co-authored-by: Alexey Bader <alexey.bader@intel.com>
tdavidcl
commented
Feb 12, 2026
Thank you. I found some post ROCm/ROCm#1288 that may be related to your comments.
It seems that the native builtins are better whenever available. I can try to replace the amd & the else branch by __builtin_readcyclecounter then ?
zjin-lcf
commented
Feb 12, 2026
@tdavidcl Please give a try for the amd and the else branch. Thanks.
tdavidcl
commented
Feb 13, 2026
I've added it now it needs a bit of testing. I do not have access to a AMD GPU right now though. The best way of action would be probably a simple test to check that it compiles in all configurations + check that the return is both non zero and monotonically increase in subsequent calls. Where is the best spot to add such a test ?
KornevNikita
commented
Feb 16, 2026
I've added it now it needs a bit of testing. I do not have access to a AMD GPU right now though. The best way of action would be probably a simple test to check that it compiles in all configurations + check that the return is both non zero and monotonically increase in subsequent calls. Where is the best spot to add such a test ?
KornevNikita
commented
Feb 16, 2026
@tdavidcl thanks for working on this! Also, these functions require device to support aspects:
llvm/sycl/include/sycl/ext/oneapi/experimental/clock.hpp
Lines 47 to 49 in 1a77753
That means we also need something like this but for CUDA adapter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note - do not apply this as is, clang-format will fail because strings should be <= 80 symbols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably like:
if constexpr (Scope == clock_scope::work_group ||
Scope == clock_scope::sub_group) {
tdavidcl
commented
Feb 16, 2026
I've added it now it needs a bit of testing. I do not have access to a AMD GPU right now though. The best way of action would be probably a simple test to check that it compiles in all configurations + check that the return is both non zero and monotonically increase in subsequent calls. Where is the best spot to add such a test ?
Oh perfect it looks like no changes are required in the tests beside enabling the device aspect. Additionally, in the clock test there is this snippet
// UNSUPPORTED: target-native_cpu
// UNSUPPORTED-TRACKER: https://github.com/intel/llvm/issues/20142
I have to check but i think that __builtin_readcyclecounter does support the host and maybe clock() could also be enabled for target-native_cpu.
Uh oh!
There was an error while loading. Please reload this page.
Hi after suggestion from @zjin-lcf here is a PR (context: KhronosGroup/SYCL-Docs#958).
It implements the NVPTX variant of clock() using the
%%clock64special register from PTX.https://docs.nvidia.com/cuda/archive/10.1/parallel-thread-execution/index.html?utm_source=chatgpt.com#special-registers-clock64
So it is safe to assume that the register is supported regardless of the PTX version used since intel llvm assume >5.0 if I recall correctly.
reference for usage internally to llvm (on this repo actually, nice :) )
llvm/clang/lib/Headers/__clang_cuda_device_functions.h
Line 1558 in 2705b67
(there is a typo in the PR which is already corrected by a commit, but i don't why it is not updating in the PR ...)