Google Research Datasets
- 1.1k followers
- Mountain View, CA
- http://research.google
Pinned Loading
-
natural-questions
natural-questions PublicNatural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question ans...
-
conceptual-captions
conceptual-captions PublicConceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.
-
dstc8-schema-guided-dialogue
dstc8-schema-guided-dialogue PublicThe Schema-Guided Dialogue Dataset
Repositories
- ssa-ai-terminologies Public
This dataset provides a glossary of AI terms in Swahili, Zulu, Xhosa, Afrikaans, English (as the common core), and other languages widely spoken in Africa. It's a JSON file, covering "Basic" and "Advanced" levels, to improve AI literacy.
google-research-datasets/ssa-ai-terminologies’s past year of commit activity - wit-retrieval Public
google-research-datasets/wit-retrieval’s past year of commit activity - Amplify_SSA Public
An annotated dataset of 9,003 adversarial queries in seven Sub-Saharan African languages.
google-research-datasets/Amplify_SSA’s past year of commit activity - cultural_familiarity_annotations Public
The dataset consists of AI generated stories and accompanied human ratings on their cultural fluency and relevance.
google-research-datasets/cultural_familiarity_annotations’s past year of commit activity - tydiqa-wana Public
google-research-datasets/tydiqa-wana’s past year of commit activity - conceptual-12m Public
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
google-research-datasets/conceptual-12m’s past year of commit activity - sanpo_dataset Public
google-research-datasets/sanpo_dataset’s past year of commit activity - common-crawl-domain-names Public
Corpus of domain names scraped from Common Crawl and manually annotated to add word boundaries (e.g. "commoncrawl" to "common crawl").
google-research-datasets/common-crawl-domain-names’s past year of commit activity - rag_conflicts Public
CONFLICTS is a QA dataset annotated with knowledge conflict types. Each instance comprises a query, a set of retrieved relevant passages, a corresponding conflict type label, and, for specific types, the ground truth correct answer
google-research-datasets/rag_conflicts’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.