-
-
Notifications
You must be signed in to change notification settings - Fork 792
Multi-backend refactor: Alpha release ( AMD ROCm ONLY ) #1339
-
This space is intended to receive feedback from users that are willing to help us by alpha testing the current implementation of the AMD ROCm backend.
Issues to discuss could be
- bugs
- installation
- needed docs
- performance
- ease of use
- ... you name it ...
Thanks everyone for you kind support and please remember to have a constructive tone 🤗
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 11 comments 3 replies
-
Trying to install on ubuntu 22.06 rocm 6.1.2 and I can't pass the compilation with the following error:
-- Configuring bitsandbytes (Backend: hip)
CMake Error at /opt/rocm/lib/cmake/hip-lang/hip-lang-config.cmake:126 (message):
hip-lang Error:Permission denied - clangrt builtins lib could not be found.
Call Stack (most recent call first):
/usr/share/cmake-3.22/Modules/CMakeHIPInformation.cmake:146 (find_package)
CMakeLists.txt:175 (enable_language)
Beta Was this translation helpful? Give feedback.
All reactions
-
I have been using InvokeAI (version 5 now 😸). Installing the multi-backend-refactor
version of bitsandbytes eliminated an annoying "stutter"/freezing issue that I had at the start of generation making the whole process much more enjoyable.
Installation was straight forward enough. (Except the install instructions seem to have moved from the link provided at the non_cuda_backends docs page -> current install instruction link)
(on archlinux, rocm v6.2.1)
Beta Was this translation helpful? Give feedback.
All reactions
-
Linux 6.10.10-arch1-1 #1 SMP PREEMPT_DYNAMIC 2024年9月12日 17:21:02 +0000 x86_64 GNU/Linux
I am on archlinux with 7600 XT , everything works except bitsandbytes (torch , onnxruntime, etc ..)
I have tried 5.7 python packages with rocm 6.0.2 official archlinux build - bitsandbytes failed
Then I have tried 6.1 whl packages with 6.2 aur opencl-amd-dev package , then again could not make bitsandbytes work.
followed AMD ROCM site instructions and checked out mutli-backend-refactor
Why does it think rocm version as cuda version and HSA as compute capability isnt it confusing code. I have checked version checking init.py , it struggled finding tag and versioning for a while and then I have checked out latest commit and it somehow managed.
git branch --show-current
rocm_enabled
git rev-parse HEAD
c336a2644c6590e16a1d64cc695a06523bb9824e
git describe --tags --always --dirty --long
0.41.0-355-gc336a26
Could you help me out please ? I want to be able to use 8 Bit and 4 Bit Quantization function.
Thank you
pip packages:
bitsandbytes 0.43.2.dev0
pytorch-triton-rocm 3.0.0
torch 2.4.1+rocm6.1
torchaudio 2.4.1+rocm6.1
torchvision 0.19.1+rocm6.1
Pacman packages:
opencl-amd 1:6.2.1-1
opencl-amd-dev 1:6.2.1-1
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(11, 0), cuda_version_string='61', cuda_version_tuple=(6, 1))
PyTorch settings found: CUDA_VERSION=61, Highest Compute Capability: (11, 0).
WARNING: CUDA versions lower than 11 are currently not supported for LLM.int8().
You will be only to use 8-bit optimizers and quantization routines!
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
The directory listed in your path is found to be non-existent: /home/gediz/.pyenv/pyenv.d
The directory listed in your path is found to be non-existent: /usr/etc/pyenv.d
The directory listed in your path is found to be non-existent: /usr/local/etc/pyenv.d
The directory listed in your path is found to be non-existent: /etc/pyenv.d
The directory listed in your path is found to be non-existent: /usr/lib/pyenv/hooks
The directory listed in your path is found to be non-existent: //debuginfod.archlinux.org
CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
Error invalid device function at line 125 in file /home/gediz/bitsandbytes/csrc/ops.hip
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 3700X 8-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 3700X 8-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 0
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 17975528(0x11248e8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 17975528(0x11248e8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 17975528(0x11248e8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1100
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 7600 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 2048(0x800) KB
Chip ID: 29824(0x7480)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2470
BDFID: 1280
Internal Node ID: 1
Compute Unit: 32
SIMDs per CU: 2
Shader Engines: 2
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 262
SDMA engine uCode:: 21
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
echo $LD_LIBRARY_PATH
/opt/rocm/lib:
Beta Was this translation helpful? Give feedback.
All reactions
-
I recently came across this, not sure if it is helpful: https://github.com/ROCm/bitsandbytes
Beta Was this translation helpful? Give feedback.
All reactions
-
the command from the documentation:
git clone --depth 1 -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
Generates this problem:
File "<string>", line 58, in <module>
File "<string>", line 40, in get_version_and_write_to_file
File "<string>", line 30, in get_latest_semver_tag
ValueError: No valid semantic version tags found
Using the command:
git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
I can run the build.
It's probably a problem with missing tags since we're using a different name than the standard branch tag name.
I don't know much about semver, I just solved this by cloning the entire repo and it (semver) arbitrarily chose a tag to generate the build.
These are the commands I used for compilation:
python3.11 -m venv venv source venv/bin/activate # torch 2.4.1 pip3 install torch --index-url https://download.pytorch.org/whl/rocm6.1 pip install -r requirements-dev.txt cmake -DBNB_ROCM_ARCH="gfx1100" -DCOMPUTE_BACKEND=hip -S . make -j16 pip install -e .
python -m bitsandbytes:
Could not find the bitsandbytes ROCm binary at PosixPath('/home/user/repos/external/bitsandbytes/bitsandbytes/libbitsandbytes_rocm61.so')
Could not load bitsandbytes native library: /home/user/repos/external/bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "/home/user/repos/external/bitsandbytes/bitsandbytes/cextension.py", line 125, in <module>
lib = get_native_library()
^^^^^^^^^^^^^^^^^^^^
File "/home/user/repos/external/bitsandbytes/bitsandbytes/cextension.py", line 104, in get_native_library
dll = ct.cdll.LoadLibrary(str(binary_path))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/ctypes/__init__.py", line 454, in LoadLibrary
return self._dlltype(name)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/ctypes/__init__.py", line 376, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/user/repos/external/bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
ROCm Setup failed despite ROCm being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate ROCm libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
g++ (GCC) 14.2.1 20240910
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='61', rocm_version_tuple=(6, 1)
PyTorch settings found: ROCM_VERSION=61
Library not found: /home/user/repos/external/bitsandbytes/bitsandbytes/libbitsandbytes_rocm61.so.
Maybe you need to compile it from source? If you compiled from source, check that ROCM_VERSION
in PyTorch Settings matches your ROCm install. If not, reinstall PyTorch for your ROCm version
and rebuild bitsandbytes.
The directory listed in your path is found to be non-existent: //debuginfod.archlinux.org
The directory listed in your path is found to be non-existent: /etc/gtk-2.0/gtkrc
The directory listed in your path is found to be non-existent: /etc/gtk/gtkrc
The directory listed in your path is found to be non-existent: /home/user/.gtkrc
The directory listed in your path is found to be non-existent: /Sessions/2
The directory listed in your path is found to be non-existent: /Windows/1
The directory listed in your path is found to be non-existent: local/x470-AORUS
The directory listed in your path is found to be non-existent: @/tmp/.ICE-unix/1943,unix/x470-AORUS
The directory listed in your path is found to be non-existent: /org/freedesktop/DisplayManager/Seat0
The directory listed in your path is found to be non-existent: /org/freedesktop/DisplayManager/Session1
WARNING! ROCm runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
Couldn't load the bitsandbytes library, likely due to missing binaries.
Please ensure bitsandbytes is properly installed.
For source installations, compile the binaries with `cmake -DCOMPUTE_BACKEND=hip -S .`.
See the documentation for more details if needed.
Trying a simple check anyway, but this will likely fail...
Traceback (most recent call last):
File "/home/user/repos/external/bitsandbytes/bitsandbytes/diagnostics/main.py", line 73, in main
sanity_check()
File "/home/user/repos/external/bitsandbytes/bitsandbytes/diagnostics/main.py", line 42, in sanity_check
adam.step()
File "/run/media/user/SSD_SATA/projects/projectTest/venv/lib/python3.11/site-packages/torch/optim/optimizer.py", line 484, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/run/media/user/SSD_SATA/projects/projectTest/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/repos/external/bitsandbytes/bitsandbytes/optim/optimizer.py", line 287, in step
self.update_step(group, p, gindex, pindex)
File "/run/media/user/SSD_SATA/projects/projectTest/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/repos/external/bitsandbytes/bitsandbytes/optim/optimizer.py", line 500, in update_step
F.optimizer_update_32bit(
File "/home/user/repos/external/bitsandbytes/bitsandbytes/functional.py", line 1189, in optimizer_update_32bit
return backends[g.device.type].optimizer_update_32bit(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/repos/external/bitsandbytes/bitsandbytes/backends/cuda.py", line 870, in optimizer_update_32bit
optim_func = str2optimizer32bit[optimizer_name][0]
^^^^^^^^^^^^^^^^^^
NameError: name 'str2optimizer32bit' is not defined
Above we output some debug information.
Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose
WARNING: Please be sure to sanitize sensitive info from the output before posting it.
rocminfo
rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 5800X3D 8-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 5800X3D 8-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4550
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 65767152(0x3eb86f0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 65767152(0x3eb86f0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 65767152(0x3eb86f0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1100
Uuid: GPU-9255a335dd4a5296
Marketing Name: AMD Radeon RX 7900 XTX
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2371
BDFID: 12288
Internal Node ID: 1
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 262
SDMA engine uCode:: 24
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Beta Was this translation helpful? Give feedback.
All reactions
-
apparently the only problem I've found has been with ROCM_PATH which doesn't seem to respect the use of update-alternatives when there is more than one installation of rocm. I believe it should respect this, since installing multiple versions of rocm is something supported by amd.
In this case I'm using Ubuntu 22.04 to compile the source code according to the specifications of how it was tested (although I mentioned above that I use Arch Linux xD)
Beta Was this translation helpful? Give feedback.
All reactions
-
Beta Was this translation helpful? Give feedback.
All reactions
-
You can try this package on Arch Linux. opencl-amd.
When using Arch Linux and bitsandbytes for my 7900XTX the best use I found was this package(remember to install opencl-amd-dev) in version 6.1.3 with torch6.1(pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1
) and the multi-backend-refactor branch of bitsandbytes following the compilation instructions.
Beta Was this translation helpful? Give feedback.
All reactions
-
Beta Was this translation helpful? Give feedback.
All reactions
-
My environment is Ubuntu 24.04 + ROCm6.3.3 + PyTorch2.6 + Python3.10, with bitsandbytes installed through source compilation. pip list
shows transformers 4.50.3 and bitsandbytes 1.0.0 are installed. However, running python -m bitsandbytes
still displays an error:
Could not load bitsandbytes native library: /home/liuq/bitsandbytes/bitsandbytes/libbitsandbytes_rocm63.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi
Traceback (most recent call last):
File "/home/liuq/bitsandbytes/bitsandbytes/cextension.py", line 107, in <module>
lib = get_native_library()
File "/home/liuq/bitsandbytes/bitsandbytes/cextension.py", line 86, in get_native_library
dll = ct.cdll.LoadLibrary(str(binary_path))
File "/home/liuq/anaconda3/envs/bnb_env/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
return self._dlltype(name)
File "/home/liuq/anaconda3/envs/bnb_env/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/liuq/bitsandbytes/bitsandbytes/libbitsandbytes_rocm63.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi
ROCm Setup failed despite ROCm being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate ROCm libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='63', rocm_version_tuple=(6, 3)
PyTorch settings found: ROCM_VERSION=63
The directory listed in your path is found to be non-existent: local/liuq-MS-7C94
The directory listed in your path is found to be non-existent: @/tmp/.ICE-unix/2419,unix/liuq-MS-7C94
The directory listed in your path is found to be non-existent: /etc/xdg/xdg-ubuntu
The directory listed in your path is found to be non-existent: /org/gnome/Terminal/screen/1159d79f_3bcc_4398_bfe2_8f1a1a0b8a44
The directory listed in your path is found to be non-existent: //debuginfod.ubuntu.com
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
Couldn't load the bitsandbytes library, likely due to missing binaries.
Please ensure bitsandbytes is properly installed.
For source installations, compile the binaries with `cmake -DCOMPUTE_BACKEND=hip -S .`.
See the documentation for more details if needed.
Trying a simple check anyway, but this will likely fail...
Traceback (most recent call last):
File "/home/liuq/bitsandbytes/bitsandbytes/diagnostics/main.py", line 73, in main
sanity_check()
File "/home/liuq/bitsandbytes/bitsandbytes/diagnostics/main.py", line 42, in sanity_check
adam.step()
File "/home/liuq/anaconda3/envs/bnb_env/lib/python3.10/site-packages/torch/optim/optimizer.py", line 493, in wrapper
out = func(*args, **kwargs)
File "/home/liuq/anaconda3/envs/bnb_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/liuq/bitsandbytes/bitsandbytes/optim/optimizer.py", line 291, in step
self.update_step(group, p, gindex, pindex)
File "/home/liuq/anaconda3/envs/bnb_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/liuq/bitsandbytes/bitsandbytes/optim/optimizer.py", line 521, in update_step
F.optimizer_update_32bit(
File "/home/liuq/bitsandbytes/bitsandbytes/functional.py", line 1257, in optimizer_update_32bit
return backends[g.device.type].optimizer_update_32bit(
File "/home/liuq/bitsandbytes/bitsandbytes/backends/cuda.py", line 777, in optimizer_update_32bit
optim_func = str2optimizer32bit[optimizer_name][0]
NameError: name 'str2optimizer32bit' is not defined
Above we output some debug information.
Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose
WARNING: Please be sure to sanitize sensitive info from the output before posting it.
Beta Was this translation helpful? Give feedback.
All reactions
-
I have the same issue as @qdliuq. I am using ROCM 6.4 and my GPU is a 9070xt.. So it might be related to it being a newer card?
Could not load bitsandbytes native library: /home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/libbitsandbytes_rocm64.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi
Traceback (most recent call last):
File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/cextension.py", line 115, in <module>
lib = get_native_library()
File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/cextension.py", line 86, in get_native_library
dll = ct.cdll.LoadLibrary(str(binary_path))
File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/ctypes/__init__.py", line 471, in LoadLibrary
return self._dlltype(name)
~~~~~~~~~~~~~^^^^^^
File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/ctypes/__init__.py", line 390, in __init__
self._handle = _dlopen(self._name, mode)
~~~~~~~^^^^^^^^^^^^^^^^^^
OSError: /home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/libbitsandbytes_rocm64.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi
ROCm Setup failed despite ROCm being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate ROCm libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='64', rocm_version_tuple=(6, 4)
PyTorch settings found: ROCM_VERSION=64
The directory listed in your path is found to be non-existent: /opt/rocm-6.4.0/hip/lib
The directory listed in your path is found to be non-existent: ~/.ssh
The directory listed in your path is found to be non-existent: local/james-Ubuntu
The directory listed in your path is found to be non-existent: @/tmp/.ICE-unix/4167,unix/james-Ubuntu
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
Couldn't load the bitsandbytes library, likely due to missing binaries.
Please ensure bitsandbytes is properly installed.
For source installations, compile the binaries with 'cmake -DCOMPUTE_BACKEND=hip -S .'.
See the documentation for more details if needed.
Trying a simple check anyway, but this will likely fail...
adam
Traceback (most recent call last):
File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/diagnostics/main.py", line 73, in main
sanity_check()
~~~~~~~~~~~~^^
File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/diagnostics/main.py", line 42, in sanity_check
adam.step()
~~~~~~~~~^^
File "/home/james/AI/pytorch/torch/optim/optimizer.py", line 504, in wrapper
out = func(*args, **kwargs)
File "/home/james/AI/pytorch/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/optim/optimizer.py", line 292, in step
self.update_step(group, p, gindex, pindex)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/james/AI/pytorch/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/optim/optimizer.py", line 522, in update_step
F.optimizer_update_32bit(
~~~~~~~~~~~~~~~~~~~~~~~~^
self.optimizer_name,
^^^^^^^^^^^^^^^^^^^^
...<15 lines>...
skip_zeros=config["skip_zeros"],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/functional.py", line 1266, in optimizer_update_32bit
return backends[g.device.type].optimizer_update_32bit(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
optimizer_name=optimizer_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<15 lines>...
skip_zeros=skip_zeros,
^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/backends/cuda.py", line 781, in optimizer_update_32bit
optim_func = str2optimizer32bit[optimizer_name][0]
^^^^^^^^^^^^^^^^^^
NameError: name 'str2optimizer32bit' is not defined
Above we output some debug information.
Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose
WARNING: Please be sure to sanitize sensitive info from the output before posting it.
Beta Was this translation helpful? Give feedback.
All reactions
-
What happened to continuous-release_multi-backend-refactor binaries? They're no longer in releases.
If they're going to be yeeted please also update the documentation section "Pre-built Wheel Installation (recommended)": https://huggingface.co/docs/bitsandbytes/main/installation#multi-backend-pip
Hope they'd remain though, people kinda depend on them already.
Beta Was this translation helpful? Give feedback.
All reactions
-
It's back online! https://github.com/bitsandbytes-foundation/bitsandbytes/releases/tag/continuous-release_multi-backend-refactor
Beta Was this translation helpful? Give feedback.
All reactions
-
🎉 1
-
Sorry, the branch is now in a kind of dysfunctional state and I wasn't aware folks are still using the wheel. Now it only published the latest wheel, I hope you are not relying on an older wheel?
Anyways, we're actively working on integrating the changes into main
and that's why we deprecated support for this branch to focus our efforts on getting things released officially.
Beta Was this translation helpful? Give feedback.