Tech Travel Love

Nvidia Volta : most powerful GPU Architecture


Nvidia just announced a new graphic processor which is based upon new architecture code named “Volta”. This is the next generation GPU Technology that will bring AI (Artificial intelligence) to every industry (as per the company says).

On 10th May, 2017 at the 2017 GPU Technology Conference in San Jose, NIVIDIA CEO Jen-Hsun Huang announced the new NVIDIA Volta GPU Architecture. This beast Tesla V100 GPU is built for data centers for improving AI in all level of industry with improved performance and better power efficiency.

This next generation architecture consists over 21 billion transistors, 640 Tensor Cores, and this beast delivers a performance of about 100 Teraflops per second (TFLOPS) of Deep Learning. And this massive performance is 5x times than that of its previous Pascal architecture.

Volta uses next generation revolutionary NIVIDIA NVLink high-speed interconnect technology. Is delivers double the amount of throughput than that of its previous version, that enables better and more accurate advanced data model and data parallel approaches. This architecture uses Volta-optimized CUDA and NVIDIA Deep Learning SDK Libraries like cuDNN, NCCL, and TensorRT.

This Architecture will deliver higher performance, the Volta SM has lower instructions and cache latencies than past SM design and includes new feature to accelerate deep learning applications. Major feature includes :

  • Higher clocks and higher power efficiency.
  • Enhanced L1 data cache for higher performance and lower latency.
  • New mixed-precision FP16/FP32 Tensor Cores purpose-built for Deep learning matrix arithmetic.

Unlike Pascal GPUs, which could not execute FP32 and INT32 instructions simultaneously, and Volts GV100 SM includes separate FP32 and INT32 cores, allowing simultaneous execution of FP32 and INT32 operations at full throughput.

New tensor Cores are thw most important feature of the Volta GV100 architecture to deliver the performance required to train large neural network. This delivers up to 120 Tensor TFLOPS for training and inference application. Tensor Cores provide up to 12x higher peak TFLOPS on Tesla V100 for Deep learning training compared to P100 FP32 operations, and for Deep learning inference, up to 6x higher peak TFLOPS compared to P100 FP16 operations. The Tesla V100 GPU contains 640 Tensor Cores : 8 per SM.



Leave a Reply

%d bloggers like this: