GPU Benchmarks: GeForce GTX 1080

All benchmarks are done with a batch size of 64. The choice was made to fit all data in the GPU memory for each card and each network. This way there is minimal use of the CPU and workstation RAM, and the results are comparable. Each result is the average time of 100 iterations.

Future benchmarks:

  • will benchmark different cards in the same workstation
  • will consider 16-bit and 32-bit

Benchmarks by Framework

Training Benchmarks

Total time [ms] (smaller is better) for one forward and backward pass using the libraries: Tensorflow, NVcaffe, Caffe, Neon

Inference Benchmarks

Total time [ms] (smaller is better) for one forward pass using the libraries: Tensorflow, NVcaffe, Caffe, Neon

Benchmarks by Neural Networks

Training Benchmarks

Total time [ms] (smaller is better) for one forward and backward pass using the neural networks: VGG-A, OverFeat, AlexNet, GoogLeNet.

Inference Benchmarks

Total time [ms] (smaller is better) for one forward pass using the neural networks: VGG-A, OverFeat, AlexNet, GoogLeNet

Setup

GeForce GTX 1080
CPU Intel Core i7-5960x 3.00GHz x 16
Memory 32 GB
OS Ubuntu 14.04
Driver nvidia-367
Cuda 8.0
cuDNN 5.1
Caffe rc3
Neon 1.6.0+4fb5ff6
NVcaffe 0.15.10
Tensorflow 0.10.0

Benchmark tools

To perform benchmarks for Neon we used the neon/tests/run_benchmarks.py script available in the neon framework.

For Tensorflow we used the framework convnet-benchmarks and the scripts convnet-benchmarks/tensorflow/benchmark_alexnet.py, convnet-benchmarks/tensorflow/benchmark_googlnet.py, convnet-benchmarks/tensorflow/benchmark_overfeat.py and convnet-benchmarks/tensorflow/benchmark_vgg.py without modifications.

We also used the convnet-benchmarks framework for Caffe, but modified the script convnet-benchmarks/caffe/run_imagenet.sh to point to our caffe installation.

Machine Learning Meetup Italy by Addfor

To be the first to know when we publish new benchmarks register yourself in our meetup Machine Learning Italy.