The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
-
Updated
Oct 16, 2025 - Python
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
NFStream: a Flexible Network Data Analysis Framework.
A plugin for GTAV that transforms it into a vision-based self-driving car research environment.
π―π― Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Convert face dataset to masked dataset
Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
Compose multimodal datasets πΉ
μΈμ΄λͺ¨λΈμ νμ΅νκΈ° μν κ³΅κ° νκ΅μ΄ instruction datasetλ€μ λͺ¨μλμμ΅λλ€.
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
A command-line interface to generate textual and conversational datasets with LLMs.
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
Creates an index of images, queries a local LLM and adds tags to the image metadata
π Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. π§π₯π Advanced audio processing.
Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader
Data release for the ImageInWords (IIW) paper.
DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)
π Prepare VOC format datasets for ultralytics/yolov3 & yolov5
[IJCV] Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.
Add a description, image, and links to the dataset-generation topic page so that developers can more easily learn about it.
To associate your repository with the dataset-generation topic, visit your repo's landing page and select "manage topics."