Log In
- Create an account
  
  Please enter your email address
  
  Please enter your password
  
  Remember Me
  - Forgot Username or Password?
  - Unsubscribe

Resources

Automatic deep understanding of deep learning codes performance in many-core processors

Deep Neural Networks (DNN) gained significant importance in the recent years. The computational demands of DNN's are increasing due to more complicated networks and bigger datasets. We can say that deep learning entered the HPC era and thus new approaches are required in order to efficiently utilize massively parallel computing resources. The layers in the DNN have different types of time-dominant computational kernels. These kernels exert pressure on different hardware resources based on their computations type and their input/output size. The DNN architecture varies significantly across applications, making the automation of the DNN's performance understanding a necessity. The Roofline model is an excellent tool to understand the kernels' bottlenecks and the hardware utilization efficiency of the computational kernels. We present a performance engineering add-on to the famous Caffe DNN framework that utilizes the available hardware performance counters in contemporary processors to automatically generate the Roofline model and other useful measurements for each layer in the DNN, without adding significant runtime overhead. We show performance results of various DNN architectures in Intel Knights Landing many-core processors.

Resources

Automatic deep understanding of deep learning codes performance in many-core processors

Event Name

Keywords

Video Name