View mookiezi's full-sized avatar
💭
Thinking about thinking
Jason mookiezi
Thinking about thinking
🌱 Developing software tools for archiving online platforms, with a focus on NLP
Pinned Loading
-
dataset-cleaning-toolkit
dataset-cleaning-toolkit PublicA dataset toolbox for preparing and analyzing conversational datasets, including CSV splitting, CSV → Parquet conversion, dataset statistics, Parquet cleaning and sorting, HuggingFace–style metadat...
Python 3
-
dataset-pipeline
dataset-pipeline PublicA full Discord dataset pipeline with end-to-end flow from raw Discord data to final Parquet dataset with full statistics — every stage independant, idempotent, and CLI-driven for ease of automation.
-
dataset-toolbox
dataset-toolbox PublicA dataset toolbox for preparing and analyzing conversational datasets, including CSV splitting, CSV → Parquet conversion, dataset statistics, dialogue-turn filtering, turn-based filtering, token an...
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.