-
Notifications
You must be signed in to change notification settings - Fork 31
Experimental dpctl support for native_cpu device
#2051
-
The oneAPI DPC++ compiler now has experimental support for a "native CPU" device, which treats the host CPU as a "first-class citizen."
This discussion is meant both to explore the use of native_cpu devices, and to provide convenient instructions on how to start with dpctl and native_cpu targets.
OS: Ubuntu 24.04 Noble
CPU: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Initial step
I first created a Conda environment containing the requirements for building dpctl (see documentation)
Setting up the compiler
I first cloned the DPC++ compiler from the Github repo for the oneAPI DPC++ compiler. With my local copy, I read through the documentation and (after one failed experiment) found that I could successfully build the compiler and begin to build dpctl, using the following
from the repo root, I ran
python buildbot/configure.py --native-cpu --llvm-external-projects="lld" having found that without --llvm-external-projects="lld" dpctl would fail to build citing lld as being at fault.
After configuring, I ran
python buildbot/compile.py
this took quite awhile, but it did succeed, creating in /path/to/repo/llvm/build/install the built compiler, with clang and clang++ in bin. I also verified that the UR adapter for native_cpu was present in lib.
Building dpctl
With the compiler built, I then set up the environment similarly to building dpctl with the nightly compiler, getting other dependencies
tar xf sycl_linux.tar.gz -C dpcpp
mkdir oclcpuexp
wget https://github.com/intel/llvm/releases/download/2024-WW43/oclcpuexp-20241810.0.08_rel.tar.gz
tar xf oclcpuexp-20241810.0.08_rel.tar.gz -C ./oclcpuexp
wget https://github.com/oneapi-src/oneTBB/releases/download/v2021.12.0/oneapi-tbb-2021120-lin.tgz
tar xf oneapi-tbb-2021120-lin.tgz
cp oclcpuexp/x64/libOpenCL.so* lib/
then set up LD_LIBRARY_PATH and PATH, similar to nightly builds
cat << 'EOF' > set_allvars.sh
#!/usr/bin/bash
export COMPILER_ROOT_DIR=/path/to/compiler/llvm/build/install
export PATH=${COMPILER_ROOT_DIR}/bin:${PATH}
export LD_LIBRARY_PATH=${COMPILER_ROOT_DIR}/lib:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=${COMPILER_ROOT_DIR}/oclcpuexp/x64:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=${COMPILER_ROOT_DIR}/oneapi-tbb-2021120/lib/intel64/gcc4.8:${LD_LIBRARY_PATH}
export OCL_ICD_VENDORS=
export OCL_ICD_FILENAMES=libintelocl.so
EOF
chmod +x set_allvars.sh
cat set_allvars.sh
I then ran this to verify sycl-ls showed native_cpu device
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Graphics [0x9a49] 12.0.0 [1.3.29735+27]
[native_cpu:cpu][native_cpu:0] SYCL_NATIVE_CPU, SYCL Native CPU 0.1 [0.0.0]
[opencl:gpu][opencl:0] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x9a49] OpenCL 3.0 NEO [24.39.31294]
and it worked!
Now I ran
python scripts/build_locally.py --verbose --compiler-root ${COMPILER_ROOT_DIR} --c-compiler ${COMPILER_ROOT_DIR}/bin/clang --cxx-compiler ${COMPILER_ROOT_DIR}/bin/clang++ --cmake-opts="-DDPCTL_SYCL_TARGETS=native_cpu"
...this worked, to a point. The _tensor_linalg sub-module failed to build and, after it, the _tensor_sorting sub-module. I commented both of these out.
This eventually succeeded, though warnings were thrown for a significant amount of math functions, like some trig functions, log1pf, etc.
After this, it was possible to import dpctl and run dpctl.lsplatform(2) and see SYCL_NATIVE_CPU as a platform
In [1]: dpctl In [2]: dpctl.lsplatform(2) Platform 0 :: Name Intel(R) oneAPI Unified Runtime over Level-Zero Version 1.3 Vendor Intel(R) Corporation Backend ext_oneapi_level_zero Num Devices 1 # 0 Name Intel(R) Graphics [0x9a49] Version 1.3.29735+27 Filter string level_zero:gpu:0 Platform 1 :: Name SYCL_NATIVE_CPU Version 0.1 Vendor tbd Backend ext_oneapi_native_cpu Num Devices 1 # 0 Name SYCL Native CPU Version 0.0.0 Filter string unknown:cpu:0 Platform 2 :: Name Intel(R) OpenCL Graphics Version OpenCL 3.0 Vendor Intel(R) Corporation Backend opencl Num Devices 1 # 0 Name Intel(R) Graphics [0x9a49] Version 24.39.31294 Filter string opencl:gpu:0
Implementing native_cpu in dpctl
dpctl.get_devices() would ignore the native_cpu device because it hadn't been hooked up in dpctl's machinery. So I made adjustments to enable it
In [1]: import dpctl.tensor as dpt, dpctl, numpy as np In [2]: dpctl.get_devices() Out[2]: [<dpctl.SyclDevice [backend_type.level_zero, device_type.gpu, Intel(R) Graphics [0x9a49]] at 0x7fe6835e9b70>, <dpctl.SyclDevice [backend_type.native_cpu, device_type.cpu, SYCL Native CPU] at 0x7fe6835ea530>, <dpctl.SyclDevice [backend_type.opencl, device_type.gpu, Intel(R) Graphics [0x9a49]] at 0x7fe6835ea3b0>] In [3]: x = dpt.arange(10**6, device="cpu") In [4]: x Out[4]: usm_ndarray([ 0, 1, 2, ..., 999997, 999998, 999999]) In [5]: x.sycl_device Out[5]: <dpctl.SyclDevice [backend_type.native_cpu, device_type.cpu, SYCL Native CPU] at 0x7fe682f43730>
And it's a success! Some kernels can even be run with it
In [1]: import dpctl.tensor as dpt, dpctl, numpy as np
In [2]: x = dpt.arange(10**6, device="cpu")
In [3]: y = dpt.ones(10**6, dtype=x.dtype, device="cpu")
In [4]: r = x + y
In [5]: r
Out[5]: usm_ndarray([ 1, 2, 3, ..., 999998, 999999, 1000000])
In [6]: r.sycl_device
Out[6]: <dpctl.SyclDevice [backend_type.native_cpu, device_type.cpu, SYCL Native CPU] at 0x7f2c1fd38370>
Public branch
To experiment with this, the branch experimental/support-native-cpu-device has been made available, commenting out the failing sub-modules and implementing native_cpu in the machinery.
Beta Was this translation helpful? Give feedback.
All reactions
-
🚀 2