PyPi PyPi PyPi Contributors License OS
A powerful open-source tool for analyzing image and video datasets founded by the authors of XGBoost, Apache TVM & Turi Create - Danny Bickson, Carlos Guestrin and Amir Alush.
Documentation · Features · Report Bug · Blog · Quickstart · Visual Layer Cloud
pip
install fastdup from PyPI:
pip install fastdup
More installation options are available here.
Initialize and run fastdup:
import fastdup fd = fastdup.create(input_dir="IMAGE_FOLDER/") fd.run()
Explore the results in a interactive web UI:
fd.explore()
Alternatively, visualize the result in a static gallery:
fd.vis.duplicates_gallery() # gallery of duplicates fd.vis.outliers_gallery() # gallery of outliers fd.vis.component_gallery() # gallery of connected components fd.vis.stats_gallery() # gallery of image statistics (e.g. blur, brightness, etc.) fd.vis.similarity_gallery() # gallery of similar images
Check this quickstart tutorial for more info
quickstart_video.4.mp4
fastdup handles labeled/unlabeled datasets in image or video format, providing a range of features:
What sets fastdup apart from other similar tools:
- Quality: High-quality analysis to identify duplicates/near-duplicates, outliers, mislabels, broken images, and low-quality images.
- Scale: Highly scalable, capable of processing 400M images on a single CPU machine. Scales up to billions of images.
- Speed: Optimized C++ engine enables high performance even on low-resource CPU machines.
- Privacy: Runs locally or on your cloud infrastructure. Your data stays where it is.
- Ease of use: Works on labeled or unlabeled datasets in image or video format with support for major operating systems like MacOS, Linux and Windows.
Learn the basics of fastdup through interactive examples. View the notebooks on GitHub or nbviewer. Even better, run them on Google Colab or Kaggle, for free.
📌 Dataset: Oxford-IIIT Pet.
📌 Dataset: Oxford-IIIT Pet.
📌 Dataset: Food-101.
📌 Dataset: Shopee Product Matching.
See more examples.
Get help from the fastdup team or community members via the following channels:
Logo Logo GitHub IssuesCommunity-contributed blog posts on fastdup:
🖋️ atahan bulus • 🗓 16 September 2023
🖋️ Daniel Klitzke • 🗓 4 September 2023
🖋️ Alexander Lan • 🗓 9 March 2023
🖋️ Dickson Neoh • 🗓 23 February 2023
What our users say:
Visual Layer offers commercial services for managing, cleaning, and curating visual data at scale.
Sign-up for free.
Visual.Layer.Cloud.mp4
Not convinced? Interact with Visual Layer Cloud public dataset with no sign-up required.
Usage Tracking
We have added an experimental crash report collection using Sentry.
We DO NOT collect user-specific information such as folder names, user names, image names, image content, etc. We do collect data related to fastdup's internal operations and performance statistics such as total number of images, average runtime per image, total free memory, total free disk space, number of cores, etc.
This help us identify and resolve stability issues, thereby improving the overall reliability of fastdup. The code for the data collection is found here. On MAC we use Google crashpad to report crashes.
Users have the option to opt out of the experimental crash reporting system through one of the following methods:
- Define an environment variable called
SENTRY_OPT_OUT
- or
run()
withturi_param='run_sentry=0'
fastdup is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License.
For any more information or inquiries regarding the license, please contact us at info@visual-layer.com or see the LICENSE file.