Efficient PyTorch training with cloud data

Vertex AI Neural Architecture Search has no requirements describing how to design your trainers. Therefore, choose any training frameworks to build the trainer.

For PyTorch training with large amounts of data, the best practice is to use the distributed training paradigm and to read data from Cloud Storage. Check out the blog post Efficient PyTorch training with Vertex AI for methods to improve the training performance. You can see an overall 6x performance improvement with data on Cloud Storage using WebDataset and choosing DistributedDataParallel or FullyShardedDataParallel distributed training strategies. The training performance using data on Cloud Storage is similar to the training performance using data on a local disk.

The prebuilt MNasNet classification example has incorporated these methods into its training pipeline.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年10月13日 UTC.