Error for exllamav2_kernel, running TGI on Google Colab · huggingface/text-generation-inference · Discussion #1762

andychoi98
Apr 18, 2024

Trying to run the tgi launcher on Google Colab after locally installing, but keep on getting error messages that the kernel is not installed.
text-generation-launcher --model-id bigcode/starcoder2-3b --sharded false --quantize bitsandbytes-fp4

ERROR text_generation_launcher: exllamav2_kernels not installed.
ERROR text_generation_launcher: Shard 0 failed to start

Keep getting these errors even though I cloned and installed the turboderp/exllamav2 repo from github.
Seems like a simple issue but can anyone give me help how to solve this?

I'm running locally because google colab doesn't let me use the docker container for running the tgi.
Or is there a better way for doing this?

Thank you.

Replies: 2 comments 4 replies

suparious
Apr 18, 2024

The exllamav2_kernels that are mentionned in the error are built into the TGI app (likely in transformers). It is activated by default: disable_exllamav2=False in load_quantized_model().

Ensure you have the latest version by using pip install --upgrade transformers, or try toggling the disable bit to True.

1 reply

@andychoi98

andychoi98 Apr 18, 2024
Author

I see, then what might be the issue for getting this error?

2024年04月18日T20:25:12.602524Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

2024年04月18日 20:25:08.642575: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024年04月18日 20:25:08.642631: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024年04月18日 20:25:08.644753: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024年04月18日 20:25:09.925706: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

...
ImportError: cannot import name 'PositionRotaryEmbedding' from 'text_generation_server.utils.layers'
(/content/tgi/server/text_generation_server/utils/layers.py) rank=0
2024年04月18日T20:14:54.805319Z ERROR text_generation_launcher: Shard 0 failed to start
2024年04月18日T20:14:54.805347Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

If that error is built in, I guess this is causing me the problem for not being able to launch.

suparious
Apr 19, 2024

Seem like the most useful part of that error is "Could not find TensorRT".

Have you tried pip install tensorrt?

https://stackoverflow.com/questions/76028164/tensorflow-object-detection-tf-trt-warning-could-not-find-tensorrt

1 reply

@andychoi98

andychoi98 Apr 21, 2024
Author

Yes, but still doesn't work.

Error for exllamav2_kernel, running TGI on Google Colab #1762

Uh oh!

Uh oh!

andychoi98 Apr 18, 2024

Replies: 2 comments · 4 replies

Uh oh!

Uh oh!

suparious Apr 18, 2024

Uh oh!

Uh oh!

andychoi98 Apr 18, 2024 Author

Uh oh!

suparious Apr 19, 2024

Uh oh!

andychoi98 Apr 21, 2024 Author

andychoi98
Apr 18, 2024

Replies: 2 comments 4 replies

suparious
Apr 18, 2024

andychoi98 Apr 18, 2024
Author

suparious
Apr 19, 2024

andychoi98 Apr 21, 2024
Author