[Python] clip_tokenize: unknown token ' ' #107

New issue

Open

@h3ndrik

Description

@h3ndrik

h3ndrik

opened

on Dec 4, 2024

I've copy-pasted the example python code from examples/python_bindings/README.md

The tokenizer complains about the spaces in 'cat on a Turtle'.

I've tried both "mys/ggml_CLIP-ViT-B-32-laion2B-s34B-b79K/CLIP-ViT-B-32-laion2B-s34B-b79K_ggml-model-f16.gguf" and the q8_0 variant.

Full log:

(venv)$ python test_clip.py 
[File Info] models/CLIP-ViT-B-32-laion2B-s34B-b79K_ggml-model-q8_0.gguf
clip_model_load: description: two-tower CLIP model
clip_model_load: GGUF version: 2
clip_model_load: alignment: 32
clip_model_load: n_tensors: 397
clip_model_load: n_kv: 25
clip_model_load: ftype: q8_0
clip_model_load: text_encoder: 1
clip_model_load: vision_encoder: 1
clip_model_load: model size: 156.10 MB
clip_model_load: metadata size: 0.13 MB
clip_model_load: text model hparams
n_vocab 49408
num_positions 77
t_hidden_size 512
t_n_intermediate 2048
t_projection_dim 512
t_n_head 8
t_n_layer 12
clip_model_load: vision model hparams
image_size 224
patch_size 32
v_hidden_size 768
v_n_intermediate 3072
v_projection_dim 512
v_n_head 12
v_n_layer 12
clip_model_load: 24 MB of memory allocated
clip_tokenize: unknown token ' '
Similarity score: 0.1402660459280014

Metadata

Assignees

No one assigned

Labels

No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Python] clip_tokenize: unknown token ' ' #107

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions