Experimental dpctl support for `native_cpu` device · IntelPython/dpctl · Discussion #2051

ndgrigorian
Apr 13, 2025
Maintainer

The oneAPI DPC++ compiler now has experimental support for a "native CPU" device, which treats the host CPU as a "first-class citizen."
This discussion is meant both to explore the use of native_cpu devices, and to provide convenient instructions on how to start with dpctl and native_cpu targets.

OS: Ubuntu 24.04 Noble
CPU: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz

Initial step
I first created a Conda environment containing the requirements for building dpctl (see documentation)

Setting up the compiler
I first cloned the DPC++ compiler from the Github repo for the oneAPI DPC++ compiler. With my local copy, I read through the documentation and (after one failed experiment) found that I could successfully build the compiler and begin to build dpctl, using the following

from the repo root, I ran

python buildbot/configure.py --native-cpu --llvm-external-projects="lld"

having found that without --llvm-external-projects="lld" dpctl would fail to build citing lld as being at fault.

After configuring, I ran

python buildbot/compile.py

this took quite awhile, but it did succeed, creating in /path/to/repo/llvm/build/install the built compiler, with clang and clang++ in bin. I also verified that the UR adapter for native_cpu was present in lib.

Building dpctl
With the compiler built, I then set up the environment similarly to building dpctl with the nightly compiler, getting other dependencies

tar xf sycl_linux.tar.gz -C dpcpp
mkdir oclcpuexp
wget https://github.com/intel/llvm/releases/download/2024-WW43/oclcpuexp-20241810.0.08_rel.tar.gz
tar xf oclcpuexp-20241810.0.08_rel.tar.gz -C ./oclcpuexp
wget https://github.com/oneapi-src/oneTBB/releases/download/v2021.12.0/oneapi-tbb-2021120-lin.tgz
tar xf oneapi-tbb-2021120-lin.tgz
cp oclcpuexp/x64/libOpenCL.so* lib/

then set up LD_LIBRARY_PATH and PATH, similar to nightly builds

cat << 'EOF' > set_allvars.sh
 #!/usr/bin/bash
export COMPILER_ROOT_DIR=/path/to/compiler/llvm/build/install
export PATH=${COMPILER_ROOT_DIR}/bin:${PATH}
export LD_LIBRARY_PATH=${COMPILER_ROOT_DIR}/lib:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=${COMPILER_ROOT_DIR}/oclcpuexp/x64:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=${COMPILER_ROOT_DIR}/oneapi-tbb-2021120/lib/intel64/gcc4.8:${LD_LIBRARY_PATH}
export OCL_ICD_VENDORS=
export OCL_ICD_FILENAMES=libintelocl.so
EOF
chmod +x set_allvars.sh
cat set_allvars.sh

I then ran this to verify sycl-ls showed native_cpu device

[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Graphics [0x9a49] 12.0.0 [1.3.29735+27]
[native_cpu:cpu][native_cpu:0] SYCL_NATIVE_CPU, SYCL Native CPU 0.1 [0.0.0]
[opencl:gpu][opencl:0] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x9a49] OpenCL 3.0 NEO [24.39.31294]

and it worked!

Now I ran

python scripts/build_locally.py --verbose --compiler-root ${COMPILER_ROOT_DIR} --c-compiler ${COMPILER_ROOT_DIR}/bin/clang --cxx-compiler ${COMPILER_ROOT_DIR}/bin/clang++ --cmake-opts="-DDPCTL_SYCL_TARGETS=native_cpu"

...this worked, to a point. The _tensor_linalg sub-module failed to build and, after it, the _tensor_sorting sub-module. I commented both of these out.

This eventually succeeded, though warnings were thrown for a significant amount of math functions, like some trig functions, log1pf, etc.

After this, it was possible to import dpctl and run dpctl.lsplatform(2) and see SYCL_NATIVE_CPU as a platform

In [1]: dpctl
In [2]: dpctl.lsplatform(2)
Platform 0 ::
 Name Intel(R) oneAPI Unified Runtime over Level-Zero
 Version 1.3
 Vendor Intel(R) Corporation
 Backend ext_oneapi_level_zero
 Num Devices 1
 # 0
 Name Intel(R) Graphics [0x9a49]
 Version 1.3.29735+27
 Filter string level_zero:gpu:0
Platform 1 ::
 Name SYCL_NATIVE_CPU
 Version 0.1
 Vendor tbd
 Backend ext_oneapi_native_cpu
 Num Devices 1
 # 0
 Name SYCL Native CPU
 Version 0.0.0
 Filter string unknown:cpu:0
Platform 2 ::
 Name Intel(R) OpenCL Graphics
 Version OpenCL 3.0
 Vendor Intel(R) Corporation
 Backend opencl
 Num Devices 1
 # 0
 Name Intel(R) Graphics [0x9a49]
 Version 24.39.31294
 Filter string opencl:gpu:0

Implementing native_cpu in dpctl
dpctl.get_devices() would ignore the native_cpu device because it hadn't been hooked up in dpctl's machinery. So I made adjustments to enable it

In [1]: import dpctl.tensor as dpt, dpctl, numpy as np
In [2]: dpctl.get_devices()
Out[2]:
[<dpctl.SyclDevice [backend_type.level_zero, device_type.gpu, Intel(R) Graphics [0x9a49]] at 0x7fe6835e9b70>,
 <dpctl.SyclDevice [backend_type.native_cpu, device_type.cpu, SYCL Native CPU] at 0x7fe6835ea530>,
 <dpctl.SyclDevice [backend_type.opencl, device_type.gpu, Intel(R) Graphics [0x9a49]] at 0x7fe6835ea3b0>]
In [3]: x = dpt.arange(10**6, device="cpu")
In [4]: x
Out[4]: usm_ndarray([ 0, 1, 2, ..., 999997, 999998, 999999])
In [5]: x.sycl_device
Out[5]: <dpctl.SyclDevice [backend_type.native_cpu, device_type.cpu, SYCL Native CPU] at 0x7fe682f43730>

And it's a success! Some kernels can even be run with it

In [1]: import dpctl.tensor as dpt, dpctl, numpy as np
In [2]: x = dpt.arange(10**6, device="cpu")
In [3]: y = dpt.ones(10**6, dtype=x.dtype, device="cpu")
In [4]: r = x + y
In [5]: r
Out[5]: usm_ndarray([ 1, 2, 3, ..., 999998, 999999, 1000000])
In [6]: r.sycl_device
Out[6]: <dpctl.SyclDevice [backend_type.native_cpu, device_type.cpu, SYCL Native CPU] at 0x7f2c1fd38370>

Public branch
To experiment with this, the branch experimental/support-native-cpu-device has been made available, commenting out the failing sub-modules and implementing native_cpu in the machinery.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experimental dpctl support for `native_cpu` device #2051

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

ndgrigorian
Apr 13, 2025
Maintainer

Replies: 0 comments

Select a reply

Uh oh!

Experimental dpctl support for native_cpu device #2051

Uh oh!

Uh oh!

ndgrigorian Apr 13, 2025 Maintainer

Replies: 0 comments

Experimental dpctl support for `native_cpu` device #2051

ndgrigorian
Apr 13, 2025
Maintainer