TFDS now supports the Croissant 🥐 format! Read the documentation to know more.
tfds.beam.ReadFromTFDS
Stay organized with collections
Save and categorize content based on your preferences.
Creates a beam pipeline yielding TFDS examples.
tfds.beam.ReadFromTFDS(
pipeline,
builder: tfds.core.DatasetBuilder ,
split: str,
workers_per_shard: int = 1,
**as_dataset_kwargs
)
Used in the notebooks
| Used in the tutorials |
|---|
Each dataset shard will be processed in parallel.
Usage:
builder = tfds.builder('my_dataset')
_ = (
pipeline
| tfds.beam.ReadFromTFDS(builder, split='train')
| beam.Map(tfds.as_numpy)
| ...
)
Use tfds.as_numpy to convert each examples from tf.Tensor to numpy.
The split argument can make use of subsplits, eg 'train[:100]', only when the batch_size=None (in as_dataset_kwargs). Note: the order of the images will be different than when tfds.load(split='train[:100]') is used, but the same examples will be used.
Args | |
|---|---|
pipeline
|
beam pipeline (automatically set) |
builder
|
Dataset builder to load |
split
|
Split name to load (e.g. train+test, train)
|
workers_per_shard
|
number of workers that should read a shard in parallel. The shard will be split in this many parts. Note that workers cannot skip to a specific row in a tfrecord file, so they need to read the file up until that point without using that data. |
**as_dataset_kwargs
|
Arguments forwarded to builder.as_dataset.
|
Returns | |
|---|---|
| The PCollection containing the TFDS examples. |