Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Error for exllamav2_kernel, running TGI on Google Colab #1762

Unanswered
andychoi98 asked this question in Q&A
Discussion options

Trying to run the tgi launcher on Google Colab after locally installing, but keep on getting error messages that the kernel is not installed.
text-generation-launcher --model-id bigcode/starcoder2-3b --sharded false --quantize bitsandbytes-fp4

ERROR text_generation_launcher: exllamav2_kernels not installed.
ERROR text_generation_launcher: Shard 0 failed to start

Keep getting these errors even though I cloned and installed the turboderp/exllamav2 repo from github.
Seems like a simple issue but can anyone give me help how to solve this?

I'm running locally because google colab doesn't let me use the docker container for running the tgi.
Or is there a better way for doing this?

Thank you.

You must be logged in to vote

Replies: 2 comments 4 replies

Comment options

The exllamav2_kernels that are mentionned in the error are built into the TGI app (likely in transformers). It is activated by default: disable_exllamav2=False in load_quantized_model().

Ensure you have the latest version by using pip install --upgrade transformers, or try toggling the disable bit to True.

You must be logged in to vote
1 reply
Comment options

I see, then what might be the issue for getting this error?

2024年04月18日T20:25:12.602524Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

2024年04月18日 20:25:08.642575: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024年04月18日 20:25:08.642631: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024年04月18日 20:25:08.644753: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024年04月18日 20:25:09.925706: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

...
ImportError: cannot import name 'PositionRotaryEmbedding' from 'text_generation_server.utils.layers'
(/content/tgi/server/text_generation_server/utils/layers.py) rank=0
2024年04月18日T20:14:54.805319Z ERROR text_generation_launcher: Shard 0 failed to start
2024年04月18日T20:14:54.805347Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

If that error is built in, I guess this is causing me the problem for not being able to launch.

Comment options

Seem like the most useful part of that error is "Could not find TensorRT".

Have you tried pip install tensorrt?

https://stackoverflow.com/questions/76028164/tensorflow-object-detection-tf-trt-warning-could-not-find-tensorrt

You must be logged in to vote
1 reply
Comment options

Yes, but still doesn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /