## Performance analysis of deep learning workloads using roofline trajectories

#### CCF Transactions on High Performance Computing, 2019(Invited Paper)

M. Haseeb Javed, Khaled Z. Ibrahim, Xiaoyi Lu

### Abstract

Over the last decade, technologies derived from convolutional neural networks (CNNs) called Deep Learning applications, have revolutionized fields as diverse as cancer detection, self-driving cars, virtual assistants, etc. However, many users of such applications are not experts in Machine Learning itself. Consequently, there is limited knowledge among the community to run such applications in an optimized manner. The performance question for Deep Learning applications has typically been addressed by employing bespoke hardware (e.g., GPUs) better suited for such compute-intensive operations. However, such a degree of performance is only accessibly at increasingly high financial costs leaving only big corporations and governments with resources sufficient enough to employ them at a large scale. As a result, an average user is only left with access to commodity clusters with, in many cases, only CPUs as the sole processing element. For such users to make effective use of resources at their disposal, concerted efforts are necessary to figure out optimal hardware and software configurations. This study is one such step in this direction as we use the Roofline model to perform a systematic analysis of representative CNN models and identify opportunities for black box and application-aware optimizations. Using the findings from our study, we are able to obtain up to 3.5\$\$\times \$\$×speedup compared to vanilla TensorFlow with default configurations.

### Full text links

#### Journal Article

Da
2019/12/01
2021-01-26 19:28:06 +0000
Date-modified
2021-01-26 19:28:52 +0000
Doi
10.1007/s42514-019-00018-4
Id
Javed2019
Isbn
2524-4930
Journal
CCF Transactions on High Performance Computing
Number
3
Pages
224–239
Ty
JOUR
Volume
1
Series
THPC '19
Note
Invited Paper
Bdsk-url-1
https://doi.org/10.1007/s42514-019-00018-4