Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Multi-backend refactor: Alpha release ( AMD ROCm ONLY ) #1339

Titus-von-Koeller started this conversation in Refactorings
Discussion options

This space is intended to receive feedback from users that are willing to help us by alpha testing the current implementation of the AMD ROCm backend.

Issues to discuss could be

  • bugs
  • installation
  • needed docs
  • performance
  • ease of use
  • ... you name it ...

Thanks everyone for you kind support and please remember to have a constructive tone 🤗

You must be logged in to vote

Replies: 11 comments 3 replies

Comment options

Trying to install on ubuntu 22.06 rocm 6.1.2 and I can't pass the compilation with the following error:

-- Configuring bitsandbytes (Backend: hip)

CMake Error at /opt/rocm/lib/cmake/hip-lang/hip-lang-config.cmake:126 (message):
hip-lang Error:Permission denied - clangrt builtins lib could not be found.
Call Stack (most recent call first):
/usr/share/cmake-3.22/Modules/CMakeHIPInformation.cmake:146 (find_package)
CMakeLists.txt:175 (enable_language)

You must be logged in to vote
0 replies
Comment options

I have been using InvokeAI (version 5 now 😸). Installing the multi-backend-refactor version of bitsandbytes eliminated an annoying "stutter"/freezing issue that I had at the start of generation making the whole process much more enjoyable.
Installation was straight forward enough. (Except the install instructions seem to have moved from the link provided at the non_cuda_backends docs page -> current install instruction link)

(on archlinux, rocm v6.2.1)

You must be logged in to vote
0 replies
Comment options

Linux 6.10.10-arch1-1 #1 SMP PREEMPT_DYNAMIC 2024年9月12日 17:21:02 +0000 x86_64 GNU/Linux
I am on archlinux with 7600 XT , everything works except bitsandbytes (torch , onnxruntime, etc ..)

I have tried 5.7 python packages with rocm 6.0.2 official archlinux build - bitsandbytes failed
Then I have tried 6.1 whl packages with 6.2 aur opencl-amd-dev package , then again could not make bitsandbytes work.

followed AMD ROCM site instructions and checked out mutli-backend-refactor

Why does it think rocm version as cuda version and HSA as compute capability isnt it confusing code. I have checked version checking init.py , it struggled finding tag and versioning for a while and then I have checked out latest commit and it somehow managed.

git branch --show-current
rocm_enabled
git rev-parse HEAD
c336a2644c6590e16a1d64cc695a06523bb9824e
git describe --tags --always --dirty --long
0.41.0-355-gc336a26

Could you help me out please ? I want to be able to use 8 Bit and 4 Bit Quantization function.
Thank you

pip packages:

bitsandbytes 0.43.2.dev0
pytorch-triton-rocm 3.0.0
torch 2.4.1+rocm6.1
torchaudio 2.4.1+rocm6.1
torchvision 0.19.1+rocm6.1

Pacman packages:

opencl-amd 1:6.2.1-1
opencl-amd-dev 1:6.2.1-1

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(11, 0), cuda_version_string='61', cuda_version_tuple=(6, 1))
PyTorch settings found: CUDA_VERSION=61, Highest Compute Capability: (11, 0).
WARNING: CUDA versions lower than 11 are currently not supported for LLM.int8().
You will be only to use 8-bit optimizers and quantization routines!
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
The directory listed in your path is found to be non-existent: /home/gediz/.pyenv/pyenv.d
The directory listed in your path is found to be non-existent: /usr/etc/pyenv.d
The directory listed in your path is found to be non-existent: /usr/local/etc/pyenv.d
The directory listed in your path is found to be non-existent: /etc/pyenv.d
The directory listed in your path is found to be non-existent: /usr/lib/pyenv/hooks
The directory listed in your path is found to be non-existent: //debuginfod.archlinux.org 
CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
Error invalid device function at line 125 in file /home/gediz/bitsandbytes/csrc/ops.hip
ROCk module is loaded
===================== 
HSA System Attributes 
===================== 
Runtime Version: 1.14
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE 
System Endianness: LITTLE 
Mwaitx: DISABLED
DMAbuf Support: YES
========== 
HSA Agents 
========== 
******* 
Agent 1 
******* 
 Name: AMD Ryzen 7 3700X 8-Core Processor 
 Uuid: CPU-XX 
 Marketing Name: AMD Ryzen 7 3700X 8-Core Processor 
 Vendor Name: CPU 
 Feature: None specified 
 Profile: FULL_PROFILE 
 Float Round Mode: NEAR 
 Max Queue Number: 0(0x0) 
 Queue Min Size: 0(0x0) 
 Queue Max Size: 0(0x0) 
 Queue Type: MULTI 
 Node: 0 
 Device Type: CPU 
 Cache Info: 
 L1: 32768(0x8000) KB 
 Chip ID: 0(0x0) 
 ASIC Revision: 0(0x0) 
 Cacheline Size: 64(0x40) 
 Max Clock Freq. (MHz): 0 
 BDFID: 0 
 Internal Node ID: 0 
 Compute Unit: 16 
 SIMDs per CU: 0 
 Shader Engines: 0 
 Shader Arrs. per Eng.: 0 
 WatchPts on Addr. Ranges:1 
 Memory Properties: 
 Features: None
 Pool Info: 
 Pool 1 
 Segment: GLOBAL; FLAGS: FINE GRAINED 
 Size: 17975528(0x11248e8) KB 
 Allocatable: TRUE 
 Alloc Granule: 4KB 
 Alloc Recommended Granule:4KB 
 Alloc Alignment: 4KB 
 Accessible by all: TRUE 
 Pool 2 
 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
 Size: 17975528(0x11248e8) KB 
 Allocatable: TRUE 
 Alloc Granule: 4KB 
 Alloc Recommended Granule:4KB 
 Alloc Alignment: 4KB 
 Accessible by all: TRUE 
 Pool 3 
 Segment: GLOBAL; FLAGS: COARSE GRAINED 
 Size: 17975528(0x11248e8) KB 
 Allocatable: TRUE 
 Alloc Granule: 4KB 
 Alloc Recommended Granule:4KB 
 Alloc Alignment: 4KB 
 Accessible by all: TRUE 
 ISA Info: 
******* 
Agent 2 
******* 
 Name: gfx1100 
 Uuid: GPU-XX 
 Marketing Name: AMD Radeon RX 7600 XT 
 Vendor Name: AMD 
 Feature: KERNEL_DISPATCH 
 Profile: BASE_PROFILE 
 Float Round Mode: NEAR 
 Max Queue Number: 128(0x80) 
 Queue Min Size: 64(0x40) 
 Queue Max Size: 131072(0x20000) 
 Queue Type: MULTI 
 Node: 1 
 Device Type: GPU 
 Cache Info: 
 L1: 32(0x20) KB 
 L2: 2048(0x800) KB 
 Chip ID: 29824(0x7480) 
 ASIC Revision: 0(0x0) 
 Cacheline Size: 64(0x40) 
 Max Clock Freq. (MHz): 2470 
 BDFID: 1280 
 Internal Node ID: 1 
 Compute Unit: 32 
 SIMDs per CU: 2 
 Shader Engines: 2 
 Shader Arrs. per Eng.: 2 
 WatchPts on Addr. Ranges:4 
 Coherent Host Access: FALSE 
 Memory Properties: 
 Features: KERNEL_DISPATCH 
 Fast F16 Operation: TRUE 
 Wavefront Size: 32(0x20) 
 Workgroup Max Size: 1024(0x400) 
 Workgroup Max Size per Dimension:
 x 1024(0x400) 
 y 1024(0x400) 
 z 1024(0x400) 
 Max Waves Per CU: 32(0x20) 
 Max Work-item Per CU: 1024(0x400) 
 Grid Max Size: 4294967295(0xffffffff) 
 Grid Max Size per Dimension:
 x 4294967295(0xffffffff) 
 y 4294967295(0xffffffff) 
 z 4294967295(0xffffffff) 
 Max fbarriers/Workgrp: 32 
 Packet Processor uCode:: 262 
 SDMA engine uCode:: 21 
 IOMMU Support:: None 
 Pool Info: 
 Pool 1 
 Segment: GLOBAL; FLAGS: COARSE GRAINED 
 Size: 16760832(0xffc000) KB 
 Allocatable: TRUE 
 Alloc Granule: 4KB 
 Alloc Recommended Granule:2048KB 
 Alloc Alignment: 4KB 
 Accessible by all: FALSE 
 Pool 2 
 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
 Size: 16760832(0xffc000) KB 
 Allocatable: TRUE 
 Alloc Granule: 4KB 
 Alloc Recommended Granule:2048KB 
 Alloc Alignment: 4KB 
 Accessible by all: FALSE 
 Pool 3 
 Segment: GROUP 
 Size: 64(0x40) KB 
 Allocatable: FALSE 
 Alloc Granule: 0KB 
 Alloc Recommended Granule:0KB 
 Alloc Alignment: 0KB 
 Accessible by all: FALSE 
 ISA Info: 
 ISA 1 
 Name: amdgcn-amd-amdhsa--gfx1100 
 Machine Models: HSA_MACHINE_MODEL_LARGE 
 Profiles: HSA_PROFILE_BASE 
 Default Rounding Mode: NEAR 
 Default Rounding Mode: NEAR 
 Fast f16: TRUE 
 Workgroup Max Size: 1024(0x400) 
 Workgroup Max Size per Dimension:
 x 1024(0x400) 
 y 1024(0x400) 
 z 1024(0x400) 
 Grid Max Size: 4294967295(0xffffffff) 
 Grid Max Size per Dimension:
 x 4294967295(0xffffffff) 
 y 4294967295(0xffffffff) 
 z 4294967295(0xffffffff) 
 FBarrier Max Size: 32 
*** Done *** 
echo $LD_LIBRARY_PATH 
/opt/rocm/lib:
You must be logged in to vote
0 replies
Comment options

I recently came across this, not sure if it is helpful: https://github.com/ROCm/bitsandbytes

You must be logged in to vote
0 replies
Comment options

the command from the documentation:

git clone --depth 1 -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/

Generates this problem:

File "<string>", line 58, in <module>
 File "<string>", line 40, in get_version_and_write_to_file
 File "<string>", line 30, in get_latest_semver_tag
 ValueError: No valid semantic version tags found

Using the command:

git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/

I can run the build.
It's probably a problem with missing tags since we're using a different name than the standard branch tag name.
I don't know much about semver, I just solved this by cloning the entire repo and it (semver) arbitrarily chose a tag to generate the build.

These are the commands I used for compilation:
python3.11 -m venv venv
source venv/bin/activate
# torch 2.4.1
pip3 install torch --index-url https://download.pytorch.org/whl/rocm6.1
pip install -r requirements-dev.txt
cmake -DBNB_ROCM_ARCH="gfx1100" -DCOMPUTE_BACKEND=hip -S .
make -j16
pip install -e .
Ironically, it's Arch Linux that's out of date. It's still using rocm6.0.
python -m bitsandbytes:
Could not find the bitsandbytes ROCm binary at PosixPath('/home/user/repos/external/bitsandbytes/bitsandbytes/libbitsandbytes_rocm61.so')
Could not load bitsandbytes native library: /home/user/repos/external/bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
 File "/home/user/repos/external/bitsandbytes/bitsandbytes/cextension.py", line 125, in <module>
 lib = get_native_library()
 ^^^^^^^^^^^^^^^^^^^^
 File "/home/user/repos/external/bitsandbytes/bitsandbytes/cextension.py", line 104, in get_native_library
 dll = ct.cdll.LoadLibrary(str(binary_path))
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/lib/python3.11/ctypes/__init__.py", line 454, in LoadLibrary
 return self._dlltype(name)
 ^^^^^^^^^^^^^^^^^^^
 File "/usr/lib/python3.11/ctypes/__init__.py", line 376, in __init__
 self._handle = _dlopen(self._name, mode)
 ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/user/repos/external/bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
ROCm Setup failed despite ROCm being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate ROCm libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
g++ (GCC) 14.2.1 20240910
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='61', rocm_version_tuple=(6, 1)
PyTorch settings found: ROCM_VERSION=61
Library not found: /home/user/repos/external/bitsandbytes/bitsandbytes/libbitsandbytes_rocm61.so.
Maybe you need to compile it from source? If you compiled from source, check that ROCM_VERSION
in PyTorch Settings matches your ROCm install. If not, reinstall PyTorch for your ROCm version
and rebuild bitsandbytes.
The directory listed in your path is found to be non-existent: //debuginfod.archlinux.org 
The directory listed in your path is found to be non-existent: /etc/gtk-2.0/gtkrc
The directory listed in your path is found to be non-existent: /etc/gtk/gtkrc
The directory listed in your path is found to be non-existent: /home/user/.gtkrc
The directory listed in your path is found to be non-existent: /Sessions/2
The directory listed in your path is found to be non-existent: /Windows/1
The directory listed in your path is found to be non-existent: local/x470-AORUS
The directory listed in your path is found to be non-existent: @/tmp/.ICE-unix/1943,unix/x470-AORUS
The directory listed in your path is found to be non-existent: /org/freedesktop/DisplayManager/Seat0
The directory listed in your path is found to be non-existent: /org/freedesktop/DisplayManager/Session1
WARNING! ROCm runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
Couldn't load the bitsandbytes library, likely due to missing binaries.
Please ensure bitsandbytes is properly installed.
For source installations, compile the binaries with `cmake -DCOMPUTE_BACKEND=hip -S .`.
See the documentation for more details if needed.
Trying a simple check anyway, but this will likely fail...
Traceback (most recent call last):
 File "/home/user/repos/external/bitsandbytes/bitsandbytes/diagnostics/main.py", line 73, in main
 sanity_check()
 File "/home/user/repos/external/bitsandbytes/bitsandbytes/diagnostics/main.py", line 42, in sanity_check
 adam.step()
 File "/run/media/user/SSD_SATA/projects/projectTest/venv/lib/python3.11/site-packages/torch/optim/optimizer.py", line 484, in wrapper
 out = func(*args, **kwargs)
 ^^^^^^^^^^^^^^^^^^^^^
 File "/run/media/user/SSD_SATA/projects/projectTest/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
 return func(*args, **kwargs)
 ^^^^^^^^^^^^^^^^^^^^^
 File "/home/user/repos/external/bitsandbytes/bitsandbytes/optim/optimizer.py", line 287, in step
 self.update_step(group, p, gindex, pindex)
 File "/run/media/user/SSD_SATA/projects/projectTest/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
 return func(*args, **kwargs)
 ^^^^^^^^^^^^^^^^^^^^^
 File "/home/user/repos/external/bitsandbytes/bitsandbytes/optim/optimizer.py", line 500, in update_step
 F.optimizer_update_32bit(
 File "/home/user/repos/external/bitsandbytes/bitsandbytes/functional.py", line 1189, in optimizer_update_32bit
 return backends[g.device.type].optimizer_update_32bit(
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/user/repos/external/bitsandbytes/bitsandbytes/backends/cuda.py", line 870, in optimizer_update_32bit
 optim_func = str2optimizer32bit[optimizer_name][0]
 ^^^^^^^^^^^^^^^^^^
NameError: name 'str2optimizer32bit' is not defined
Above we output some debug information.
Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose
WARNING: Please be sure to sanitize sensitive info from the output before posting it.
rocminfo
rocminfo 
ROCk module is loaded
===================== 
HSA System Attributes 
===================== 
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE 
System Endianness: LITTLE 
Mwaitx: DISABLED
DMAbuf Support: YES
========== 
HSA Agents 
========== 
******* 
Agent 1 
******* 
 Name: AMD Ryzen 7 5800X3D 8-Core Processor
 Uuid: CPU-XX 
 Marketing Name: AMD Ryzen 7 5800X3D 8-Core Processor
 Vendor Name: CPU 
 Feature: None specified 
 Profile: FULL_PROFILE 
 Float Round Mode: NEAR 
 Max Queue Number: 0(0x0) 
 Queue Min Size: 0(0x0) 
 Queue Max Size: 0(0x0) 
 Queue Type: MULTI 
 Node: 0 
 Device Type: CPU 
 Cache Info: 
 L1: 32768(0x8000) KB 
 Chip ID: 0(0x0) 
 ASIC Revision: 0(0x0) 
 Cacheline Size: 64(0x40) 
 Max Clock Freq. (MHz): 4550 
 BDFID: 0 
 Internal Node ID: 0 
 Compute Unit: 16 
 SIMDs per CU: 0 
 Shader Engines: 0 
 Shader Arrs. per Eng.: 0 
 WatchPts on Addr. Ranges:1 
 Features: None
 Pool Info: 
 Pool 1 
 Segment: GLOBAL; FLAGS: FINE GRAINED 
 Size: 65767152(0x3eb86f0) KB 
 Allocatable: TRUE 
 Alloc Granule: 4KB 
 Alloc Alignment: 4KB 
 Accessible by all: TRUE 
 Pool 2 
 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
 Size: 65767152(0x3eb86f0) KB 
 Allocatable: TRUE 
 Alloc Granule: 4KB 
 Alloc Alignment: 4KB 
 Accessible by all: TRUE 
 Pool 3 
 Segment: GLOBAL; FLAGS: COARSE GRAINED 
 Size: 65767152(0x3eb86f0) KB 
 Allocatable: TRUE 
 Alloc Granule: 4KB 
 Alloc Alignment: 4KB 
 Accessible by all: TRUE 
ISA Info: 
******* 
Agent 2 
******* 
 Name: gfx1100 
 Uuid: GPU-9255a335dd4a5296 
 Marketing Name: AMD Radeon RX 7900 XTX 
 Vendor Name: AMD 
 Feature: KERNEL_DISPATCH 
 Profile: BASE_PROFILE 
 Float Round Mode: NEAR 
 Max Queue Number: 128(0x80) 
 Queue Min Size: 64(0x40) 
 Queue Max Size: 131072(0x20000) 
 Queue Type: MULTI 
 Node: 1 
 Device Type: GPU 
 Cache Info: 
 L1: 32(0x20) KB 
 L2: 6144(0x1800) KB 
 L3: 98304(0x18000) KB 
 Chip ID: 29772(0x744c) 
 ASIC Revision: 0(0x0) 
 Cacheline Size: 64(0x40) 
 Max Clock Freq. (MHz): 2371 
 BDFID: 12288 
 Internal Node ID: 1 
 Compute Unit: 96 
 SIMDs per CU: 2 
 Shader Engines: 6 
 Shader Arrs. per Eng.: 2 
 WatchPts on Addr. Ranges:4 
 Coherent Host Access: FALSE 
 Features: KERNEL_DISPATCH 
 Fast F16 Operation: TRUE 
 Wavefront Size: 32(0x20) 
 Workgroup Max Size: 1024(0x400) 
 Workgroup Max Size per Dimension:
 x 1024(0x400) 
 y 1024(0x400) 
 z 1024(0x400) 
 Max Waves Per CU: 32(0x20) 
 Max Work-item Per CU: 1024(0x400) 
 Grid Max Size: 4294967295(0xffffffff) 
 Grid Max Size per Dimension:
 x 4294967295(0xffffffff) 
 y 4294967295(0xffffffff) 
 z 4294967295(0xffffffff) 
 Max fbarriers/Workgrp: 32 
 Packet Processor uCode:: 262 
 SDMA engine uCode:: 24 
 IOMMU Support:: None 
 Pool Info: 
 Pool 1 
 Segment: GLOBAL; FLAGS: COARSE GRAINED 
 Size: 25149440(0x17fc000) KB 
 Allocatable: TRUE 
 Alloc Granule: 4KB 
 Alloc Alignment: 4KB 
 Accessible by all: FALSE 
 Pool 2 
 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
 Size: 25149440(0x17fc000) KB 
 Allocatable: TRUE 
 Alloc Granule: 4KB 
 Alloc Alignment: 4KB 
 Accessible by all: FALSE 
 Pool 3 
 Segment: GROUP 
 Size: 64(0x40) KB 
 Allocatable: FALSE 
 Alloc Granule: 0KB 
 Alloc Alignment: 0KB 
 Accessible by all: FALSE 
 ISA Info: 
 ISA 1 
 Name: amdgcn-amd-amdhsa--gfx1100 
 Machine Models: HSA_MACHINE_MODEL_LARGE 
 Profiles: HSA_PROFILE_BASE 
 Default Rounding Mode: NEAR 
 Default Rounding Mode: NEAR 
 Fast f16: TRUE 
 Workgroup Max Size: 1024(0x400) 
 Workgroup Max Size per Dimension:
 x 1024(0x400) 
 y 1024(0x400) 
 z 1024(0x400) 
 Grid Max Size: 4294967295(0xffffffff) 
 Grid Max Size per Dimension:
 x 4294967295(0xffffffff) 
 y 4294967295(0xffffffff) 
 z 4294967295(0xffffffff) 
 FBarrier Max Size: 32 
*** Done *** 
You must be logged in to vote
0 replies
Comment options

apparently the only problem I've found has been with ROCM_PATH which doesn't seem to respect the use of update-alternatives when there is more than one installation of rocm. I believe it should respect this, since installing multiple versions of rocm is something supported by amd.
In this case I'm using Ubuntu 22.04 to compile the source code according to the specifications of how it was tested (although I mentioned above that I use Arch Linux xD)

You must be logged in to vote
0 replies
Comment options

I have tried lots of version on torch bitsandbytes and rocm on arch linux , at best it works partially and crashes a lot , torch 5.7 rocm 6.02 and thiis repo was sbest but still disfunctional.
...
On Wed, Nov 27, 2024, 15:50 Leonardo Sidney da Silva < ***@***.***> wrote: apparently the only problem I've found has been with ROCM_PATH which doesn't seem to respect the use of update-alternatives when there is more than one installation of rocm. I believe it should respect this, since installing multiple versions of rocm is something supported by amd. In this case I'm using Ubuntu 22.04 to compile the source code according to the specifications of how it was tested (although I mentioned above that I use Arch Linux xD) — Reply to this email directly, view it on GitHub <#1339 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEB5YX4ZNTJWBP3NJWWQTQT2CW5YRAVCNFSM6AAAAABNKWKM4OVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMZZGQ2DANQ> . You are receiving this because you commented.Message ID: <bitsandbytes-foundation/bitsandbytes/repo-discussions/1339/comments/11394406 @github.com>
You must be logged in to vote
1 reply
Comment options

You can try this package on Arch Linux. opencl-amd.
When using Arch Linux and bitsandbytes for my 7900XTX the best use I found was this package(remember to install opencl-amd-dev) in version 6.1.3 with torch6.1(pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1) and the multi-backend-refactor branch of bitsandbytes following the compilation instructions.

Comment options

I have tried this but my card is 7600 XT so its no go for me Gediz GÜRSU
...
On 2024年11月27日 at 21:55, Leonardo Sidney da Silva < ***@***.***> wrote: You can try this package on Arch Linux. opencl-amd <https://aur.archlinux.org/cgit/aur.git/commit/?h=opencl-amd&id=b9d50154b37858f363bdd83ef781891042440daa> . When using Arch Linux and bitsandbites for my 7900XTX the best use I found was this package(remember to install opencl-amd-dev) in version 6.1.3 with torch6.1(pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1) and the multi-backend-refactor branch of bitsandbytes following the compilation instructions. — Reply to this email directly, view it on GitHub <#1339 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEB5YX6INOD675ENCDLIBOL2CYITHAVCNFSM6AAAAABNKWKM4OVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMZZHAZDOOA> . You are receiving this because you commented.Message ID: <bitsandbytes-foundation/bitsandbytes/repo-discussions/1339/comments/11398278 @github.com>
You must be logged in to vote
0 replies
Comment options

My environment is Ubuntu 24.04 + ROCm6.3.3 + PyTorch2.6 + Python3.10, with bitsandbytes installed through source compilation. pip list shows transformers 4.50.3 and bitsandbytes 1.0.0 are installed. However, running python -m bitsandbytes still displays an error:

Could not load bitsandbytes native library: /home/liuq/bitsandbytes/bitsandbytes/libbitsandbytes_rocm63.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi
Traceback (most recent call last):
 File "/home/liuq/bitsandbytes/bitsandbytes/cextension.py", line 107, in <module>
 lib = get_native_library()
 File "/home/liuq/bitsandbytes/bitsandbytes/cextension.py", line 86, in get_native_library
 dll = ct.cdll.LoadLibrary(str(binary_path))
 File "/home/liuq/anaconda3/envs/bnb_env/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
 return self._dlltype(name)
 File "/home/liuq/anaconda3/envs/bnb_env/lib/python3.10/ctypes/__init__.py", line 374, in __init__
 self._handle = _dlopen(self._name, mode)
OSError: /home/liuq/bitsandbytes/bitsandbytes/libbitsandbytes_rocm63.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi
ROCm Setup failed despite ROCm being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate ROCm libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='63', rocm_version_tuple=(6, 3)
PyTorch settings found: ROCM_VERSION=63
The directory listed in your path is found to be non-existent: local/liuq-MS-7C94
The directory listed in your path is found to be non-existent: @/tmp/.ICE-unix/2419,unix/liuq-MS-7C94
The directory listed in your path is found to be non-existent: /etc/xdg/xdg-ubuntu
The directory listed in your path is found to be non-existent: /org/gnome/Terminal/screen/1159d79f_3bcc_4398_bfe2_8f1a1a0b8a44
The directory listed in your path is found to be non-existent: //debuginfod.ubuntu.com 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
Couldn't load the bitsandbytes library, likely due to missing binaries.
Please ensure bitsandbytes is properly installed.
For source installations, compile the binaries with `cmake -DCOMPUTE_BACKEND=hip -S .`.
See the documentation for more details if needed.
Trying a simple check anyway, but this will likely fail...
Traceback (most recent call last):
 File "/home/liuq/bitsandbytes/bitsandbytes/diagnostics/main.py", line 73, in main
 sanity_check()
 File "/home/liuq/bitsandbytes/bitsandbytes/diagnostics/main.py", line 42, in sanity_check
 adam.step()
 File "/home/liuq/anaconda3/envs/bnb_env/lib/python3.10/site-packages/torch/optim/optimizer.py", line 493, in wrapper
 out = func(*args, **kwargs)
 File "/home/liuq/anaconda3/envs/bnb_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
 return func(*args, **kwargs)
 File "/home/liuq/bitsandbytes/bitsandbytes/optim/optimizer.py", line 291, in step
 self.update_step(group, p, gindex, pindex)
 File "/home/liuq/anaconda3/envs/bnb_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
 return func(*args, **kwargs)
 File "/home/liuq/bitsandbytes/bitsandbytes/optim/optimizer.py", line 521, in update_step
 F.optimizer_update_32bit(
 File "/home/liuq/bitsandbytes/bitsandbytes/functional.py", line 1257, in optimizer_update_32bit
 return backends[g.device.type].optimizer_update_32bit(
 File "/home/liuq/bitsandbytes/bitsandbytes/backends/cuda.py", line 777, in optimizer_update_32bit
 optim_func = str2optimizer32bit[optimizer_name][0]
NameError: name 'str2optimizer32bit' is not defined
Above we output some debug information.
Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose
WARNING: Please be sure to sanitize sensitive info from the output before posting it.
You must be logged in to vote
0 replies
Comment options

I have the same issue as @qdliuq. I am using ROCM 6.4 and my GPU is a 9070xt.. So it might be related to it being a newer card?

Could not load bitsandbytes native library: /home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/libbitsandbytes_rocm64.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi
Traceback (most recent call last):
 File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/cextension.py", line 115, in <module>
 lib = get_native_library()
 File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/cextension.py", line 86, in get_native_library
 dll = ct.cdll.LoadLibrary(str(binary_path))
 File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/ctypes/__init__.py", line 471, in LoadLibrary
 return self._dlltype(name)
 ~~~~~~~~~~~~~^^^^^^
 File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/ctypes/__init__.py", line 390, in __init__
 self._handle = _dlopen(self._name, mode)
 ~~~~~~~^^^^^^^^^^^^^^^^^^
OSError: /home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/libbitsandbytes_rocm64.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi
 ROCm Setup failed despite ROCm being available. Please run the following command to get more information:
 python -m bitsandbytes
 Inspect the output of the command and see if you can locate ROCm libraries. You might need to add them
 to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
 and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues
 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='64', rocm_version_tuple=(6, 4)
PyTorch settings found: ROCM_VERSION=64
The directory listed in your path is found to be non-existent: /opt/rocm-6.4.0/hip/lib
The directory listed in your path is found to be non-existent: ~/.ssh
The directory listed in your path is found to be non-existent: local/james-Ubuntu
The directory listed in your path is found to be non-existent: @/tmp/.ICE-unix/4167,unix/james-Ubuntu
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
Couldn't load the bitsandbytes library, likely due to missing binaries.
Please ensure bitsandbytes is properly installed.
For source installations, compile the binaries with 'cmake -DCOMPUTE_BACKEND=hip -S .'.
See the documentation for more details if needed.
Trying a simple check anyway, but this will likely fail...
adam
Traceback (most recent call last):
 File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/diagnostics/main.py", line 73, in main
 sanity_check()
 ~~~~~~~~~~~~^^
 File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/diagnostics/main.py", line 42, in sanity_check
 adam.step()
 ~~~~~~~~~^^
 File "/home/james/AI/pytorch/torch/optim/optimizer.py", line 504, in wrapper
 out = func(*args, **kwargs)
 File "/home/james/AI/pytorch/torch/utils/_contextlib.py", line 116, in decorate_context
 return func(*args, **kwargs)
 File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/optim/optimizer.py", line 292, in step
 self.update_step(group, p, gindex, pindex)
 ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/james/AI/pytorch/torch/utils/_contextlib.py", line 116, in decorate_context
 return func(*args, **kwargs)
 File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/optim/optimizer.py", line 522, in update_step
 F.optimizer_update_32bit(
 ~~~~~~~~~~~~~~~~~~~~~~~~^
 self.optimizer_name,
 ^^^^^^^^^^^^^^^^^^^^
 ...<15 lines>...
 skip_zeros=config["skip_zeros"],
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 )
 ^
 File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/functional.py", line 1266, in optimizer_update_32bit
 return backends[g.device.type].optimizer_update_32bit(
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
 optimizer_name=optimizer_name,
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 ...<15 lines>...
 skip_zeros=skip_zeros,
 ^^^^^^^^^^^^^^^^^^^^^^
 )
 ^
 File "/home/james/miniconda3/envs/pytorch-build/lib/python3.13/site-packages/bitsandbytes/backends/cuda.py", line 781, in optimizer_update_32bit
 optim_func = str2optimizer32bit[optimizer_name][0]
 ^^^^^^^^^^^^^^^^^^
NameError: name 'str2optimizer32bit' is not defined
Above we output some debug information.
Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose
WARNING: Please be sure to sanitize sensitive info from the output before posting it.
You must be logged in to vote
0 replies
Comment options

What happened to continuous-release_multi-backend-refactor binaries? They're no longer in releases.

If they're going to be yeeted please also update the documentation section "Pre-built Wheel Installation (recommended)": https://huggingface.co/docs/bitsandbytes/main/installation#multi-backend-pip

Hope they'd remain though, people kinda depend on them already.

You must be logged in to vote
2 replies
Comment options

Comment options

Sorry, the branch is now in a kind of dysfunctional state and I wasn't aware folks are still using the wheel. Now it only published the latest wheel, I hope you are not relying on an older wheel?

Anyways, we're actively working on integrating the changes into main and that's why we deprecated support for this branch to focus our efforts on getting things released officially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

AltStyle によって変換されたページ (->オリジナル) /