Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Python] clip_tokenize: unknown token ' ' #107

Open
@h3ndrik

Description

I've copy-pasted the example python code from examples/python_bindings/README.md

The tokenizer complains about the spaces in 'cat on a Turtle'.

I've tried both "mys/ggml_CLIP-ViT-B-32-laion2B-s34B-b79K/CLIP-ViT-B-32-laion2B-s34B-b79K_ggml-model-f16.gguf" and the q8_0 variant.

Full log:

(venv)$ python test_clip.py 
[File Info] models/CLIP-ViT-B-32-laion2B-s34B-b79K_ggml-model-q8_0.gguf
clip_model_load: description: two-tower CLIP model
clip_model_load: GGUF version: 2
clip_model_load: alignment: 32
clip_model_load: n_tensors: 397
clip_model_load: n_kv: 25
clip_model_load: ftype: q8_0
clip_model_load: text_encoder: 1
clip_model_load: vision_encoder: 1
clip_model_load: model size: 156.10 MB
clip_model_load: metadata size: 0.13 MB
clip_model_load: text model hparams
n_vocab 49408
num_positions 77
t_hidden_size 512
t_n_intermediate 2048
t_projection_dim 512
t_n_head 8
t_n_layer 12
clip_model_load: vision model hparams
image_size 224
patch_size 32
v_hidden_size 768
v_n_intermediate 3072
v_projection_dim 512
v_n_head 12
v_n_layer 12
clip_model_load: 24 MB of memory allocated
clip_tokenize: unknown token ' '
Similarity score: 0.1402660459280014

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /