TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

tfds.beam.ReadFromTFDS

View source on GitHub

Creates a beam pipeline yielding TFDS examples.

tfds.beam.ReadFromTFDS(
 pipeline,
 builder: tfds.core.DatasetBuilder ,
 split: str,
 workers_per_shard: int = 1,
 **as_dataset_kwargs
)

Used in the notebooks

Used in the tutorials

Each dataset shard will be processed in parallel.

Usage:

builder = tfds.builder('my_dataset')
_ = (
 pipeline
 | tfds.beam.ReadFromTFDS(builder, split='train')
 | beam.Map(tfds.as_numpy)
 | ...
)

Use tfds.as_numpy to convert each examples from tf.Tensor to numpy.

The split argument can make use of subsplits, eg 'train[:100]', only when the batch_size=None (in as_dataset_kwargs). Note: the order of the images will be different than when tfds.load(split='train[:100]') is used, but the same examples will be used.

Args

pipeline beam pipeline (automatically set)
builder Dataset builder to load
split Split name to load (e.g. train+test, train)
workers_per_shard number of workers that should read a shard in parallel. The shard will be split in this many parts. Note that workers cannot skip to a specific row in a tfrecord file, so they need to read the file up until that point without using that data.
**as_dataset_kwargs Arguments forwarded to builder.as_dataset.

Returns

The PCollection containing the TFDS examples.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024年06月19日 UTC.