A topic-centric list of HQ open datasets.
- 
 Updated
 Oct 15, 2025 
A topic-centric list of HQ open datasets.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
pix2code: Generating Code from a Graphical User Interface Screenshot
Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Open source annotation tool for machine learning practitioners.
Techniques for deep learning with satellite & aerial imagery
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, Radar Chart and Candlestick Chart.
AI Observability & Evaluation
(CGCSTCD'2017) An easy, flexible, and accurate plate recognition project for Chinese licenses in unconstrained situations. CGCSTCD = China Graduate Contest on Smart-city Technology and Creative Design
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
搜索所有中文NLP数据集,附常用英文NLP数据集
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
A curated list of awesome JSON datasets that don't require authentication.
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
Add a description, image, and links to the datasets topic page so that developers can more easily learn about it.
To associate your repository with the datasets topic, visit your repo's landing page and select "manage topics."