karthikncode/nlp-datasets

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md

Repository files navigation

Datasets for Natural Language Processing

This is a list of datasets/corpora for NLP tasks, in reverse chronological order. Suggestions and pull requests are welcome. The goal is to make this a collaborative effort to maintain an updated list of quality datasets.

Areas

Question Answering
Dialogue Systems
Goal-Oriented Dialogue Systems

Question Answering

(NLVR) A Corpus of Natural Language for Visual Reasoning, 2017 [paper] [data]
(MS MARCO) MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, 2016 [paper] [data]
(NewsQA) NewsQA: A Machine Comprehension Dataset, 2016 [paper] [data]
(SQuAD) SQuAD: 100,000+ Questions for Machine Comprehension of Text, 2016 [paper] [data]
(GraphQuestions) On Generating Characteristic-rich Question Sets for QA Evaluation, 2016 [paper] [data]
(Story Cloze) A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories, 2016 [paper] [data]
(Children's Book Test) The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations, 2015 [paper] [data]
(SimpleQuestions) Large-scale Simple Question Answering with Memory Networks, 2015 [paper] [data]
(WikiQA) WikiQA: A Challenge Dataset for Open-Domain Question Answering, 2015 [paper] [data]
(CNN-DailyMail) Teaching Machines to Read and Comprehend, 2015 [paper] [code to generate] [data]
(QuizBowl) A Neural Network for Factoid Question Answering over Paragraphs, 2014 [paper] [data]
(MCTest) MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text, 2013 [paper] [data] [alternate data link]
(QASent) What is the Jeopardy model? A quasisynchronous grammar for QA, 2007 [paper] [data]

Dialogue Systems

(Ubuntu Dialogue Corpus) The Ubuntu Dialogue Corpus : A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems, 2015 [paper] [data]

Goal-Oriented Dialogue Systems

(Frames) Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems, 2016 [paper] [data]
(DSTC 2 & 3) Dialog State Tracking Challenge 2 & 3, 2013 [paper] [data]

About

A list of datasets/corpora for NLP tasks, in reverse chronological order.

Releases

No releases published

Packages

No packages published

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

karthikncode/nlp-datasets

Folders and files

Latest commit

History

Repository files navigation

Datasets for Natural Language Processing

Areas

Question Answering

Dialogue Systems

Goal-Oriented Dialogue Systems

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

karthikncode/nlp-datasets

Folders and files

Latest commit

History

Repository files navigation

Datasets for Natural Language Processing

Areas

Question Answering

Dialogue Systems

Goal-Oriented Dialogue Systems

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Packages