At Rescale, we spend a lot of time optimizing cloud computing for HPC workloads. With the advent of cloud-enabled GPU systems, it is now practical to train deep learning models with a high degree of performance. In this article, we look at a variety of cloud GPU systems and evaluate performance of a deep learning workload on these systems.
When training on recent generation GPUs like P100s and V100s, it is not sufficient to just have high-performance accelerators in isolation. They must be connected to storage that can supply training data at high throughput. Picking the “default” storage options for some cloud providers will likely lead to sub-optimal performance.
subscribe via RSS