Stars
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Collection of Open Source Projects Related to GPT,GPT相关开源项目合集🚀、精选🔥🔥
QLoRA: Efficient Finetuning of Quantized LLMs
LLM training code for Databricks foundation models
shadowsocks / go-shadowsocks2
Forked from riobard/go-shadowsocks2Modern Shadowsocks in Go
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
An open-source tool-augmented conversational language model from Fudan University
A collection of libraries to optimise AI model performances
Making large AI models cheaper, faster and more accessible
Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
The UI design language and React library for Conversational UI
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Must-read papers on prompt-based tuning for pre-trained language models.
A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.