Resources

IXPUG banner image

Deep Neural Networks (DNN) gained significant importance in the recent years. The computational demands of DNN's are increasing due to more complicated networks and bigger datasets. We can say that deep learning entered the HPC era and thus new approaches are required in order to efficiently utilize massively parallel computing resources. The layers in the DNN have different types of time-dominant computational kernels. These kernels exert pressure on different hardware resources based on their computations type and their input/output size. The DNN architecture varies significantly across applications, making the automation of the DNN's performance understanding a necessity. The Roofline model is an excellent tool to understand the kernels' bottlenecks and the hardware utilization efficiency of the computational kernels. We present a performance engineering add-on to the famous Caffe DNN framework that utilizes the available hardware performance counters in contemporary processors to automatically generate the Roofline model and other useful measurements for each layer in the DNN, without adding significant runtime overhead. We show performance results of various DNN architectures in Intel Knights Landing many-core processors.

Event Name

IXPUG BoF SC16

Keywords

ixpug

Video Name

NA