Distributed Training

Frameworks and libraries for distributed training

Overview from lambdalabs

  • multi gpu using parameter server: reduce and broadcast done on CPU
  • multi gpu all-reduce in one one, using nccl
  • asynchronous distributed SGD
  • synchronous distributed SGD
  • multiple parameter servers
  • ring all reduce distributed training

Low level

gpudirect from nvidia

  • 2019 Storage: from/to NVMe devices
  • 2013 RMDA: from/to network
  • 2011 GPU Peer to Peer: high speed DMA

Frameworks

Apache Spark

Tensorflow

Horovod