Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

AkiRusProd/CLIP-search

Repository files navigation

CLIP-search

This is an impelementation of image search engine using CLIP (Contrastive Language-Image Pre-Training)

CLIP is a model that links text and images in a common space and makes it possible to understand the semantic relationship between them. CLIP is trained on a large dataset and is trained to understand how text and images relate to each other. CLIP can be used to classify images, search for similar images, generate text descriptions for images, and other tasks related to text and images.

The current implementation allows you to find images in a local directory based on either text or another image. The CLIP model first computes embeddings for the provided text or image. Then it compares these embeddings with the embeddings of images in the local directory to find similar images. Finally, it returns the top k similar images from the directory.

This implementation leverages the power of CLIP's embeddings to enable searching for visually or semantically similar images in a local directory. By computing embeddings for both the query and the images in the directory, it can efficiently compare and identify the most similar images based on their embeddings.

The implementation provides two methods for finding similar images:

  1. Slow method: Computes the cosine similarity between the query and all images in the directory individually. Сan be computationally expensive and time-consuming for large directories.

  2. Fast method: Utilizes clustering to find similar images by comparing the query with cluster centroids. Can be less accurate but faster. Speed matters for large directories.

You can choose between these methods based on your requirements for speed and accuracy.

This implementation utilizes the Hugging Face CLIP model. The Faiss library is used for clustering, and the web UI is built using Gradio.

Read more about CLIP at https://github.com/openai/CLIP.

About

An impelementation of image search engine using CLIP (Contrastive Language-Image Pre-Training

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /