Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Google Research Datasets

Datasets released by Google Research

Pinned Loading

  1. natural-questions natural-questions Public

    Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question ans...

    Python 1.1k 156

  2. conceptual-captions conceptual-captions Public

    Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.

    Shell 553 27

  3. Objectron Objectron Public

    Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the came...

    Jupyter Notebook 2.3k 261

  4. wit wit Public

    WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

    1.1k 44

  5. paws paws Public

    This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase ident...

    Python 561 54

  6. dstc8-schema-guided-dialogue dstc8-schema-guided-dialogue Public

    The Schema-Guided Dialogue Dataset

    Python 585 130

Repositories

Loading
Type
Select type
Language
Select language
Sort
Select order
Showing 10 of 171 repositories
  • ssa-ai-terminologies Public

    This dataset provides a glossary of AI terms in Swahili, Zulu, Xhosa, Afrikaans, English (as the common core), and other languages widely spoken in Africa. It's a JSON file, covering "Basic" and "Advanced" levels, to improve AI literacy.

    google-research-datasets/ssa-ai-terminologies’s past year of commit activity
    HTML 0 CC-BY-SA-4.0 0 0 0 Updated Oct 16, 2025
  • wit-retrieval Public
    google-research-datasets/wit-retrieval’s past year of commit activity
    5 0 1 0 Updated Oct 13, 2025
  • Amplify_SSA Public

    An annotated dataset of 9,003 adversarial queries in seven Sub-Saharan African languages.

    google-research-datasets/Amplify_SSA’s past year of commit activity
    Jupyter Notebook 2 3 0 0 Updated Sep 17, 2025
  • cultural_familiarity_annotations Public

    The dataset consists of AI generated stories and accompanied human ratings on their cultural fluency and relevance.

    google-research-datasets/cultural_familiarity_annotations’s past year of commit activity
    1 Apache-2.0 1 0 0 Updated Aug 6, 2025
  • tydiqa-wana Public
    google-research-datasets/tydiqa-wana’s past year of commit activity
    Jupyter Notebook 0 Apache-2.0 0 0 0 Updated Jul 31, 2025
  • conceptual-12m Public

    Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

    google-research-datasets/conceptual-12m’s past year of commit activity
    404 20 5 0 Updated Jul 14, 2025
  • sanpo_dataset Public
    google-research-datasets/sanpo_dataset’s past year of commit activity
    Python 45 Apache-2.0 2 5 3 Updated Jun 27, 2025
  • common-crawl-domain-names Public

    Corpus of domain names scraped from Common Crawl and manually annotated to add word boundaries (e.g. "commoncrawl" to "common crawl").

    google-research-datasets/common-crawl-domain-names’s past year of commit activity
    20 MIT 2 0 0 Updated Jun 16, 2025
  • rag_conflicts Public

    CONFLICTS is a QA dataset annotated with knowledge conflict types. Each instance comprises a query, a set of retrieved relevant passages, a corresponding conflict type label, and, for specific types, the ground truth correct answer

    google-research-datasets/rag_conflicts’s past year of commit activity
    10 Apache-2.0 1 1 0 Updated Jun 11, 2025
  • egotempo Public
    google-research-datasets/egotempo’s past year of commit activity
    Jupyter Notebook 26 CC-BY-4.0 0 3 0 Updated Apr 26, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.

AltStyle によって変換されたページ (->オリジナル) /