Vectorization is increasingly important to achieve high performance on modern hardware with SIMD instructions. Assembly of matrices and vectors in the finite element method, which is characterized by iterating a local assembly kernel over unstructured meshes, poses challenges to effective vectorization. Maintaining a user-friendly high-level interface with a suitable degree of abstraction while generating efficient, vectorized code for the finite element method is a challenge for numerical software systems and libraries. In this talk, we study the cross-element vectorization in the finite framework Firedrake and demonstrate the efficacy of such an approach by evaluating a wide range of matrix-free operators spanning different polynomial degrees and discretizations on two recent Intel CPUs using three mainstream compilers. Our experiments show that cross-element vectorization achieves 30% of theoretical peak performance for many examples of practical significance, and exceeds 50% for cases with high arithmetic intensities, with consistent speed-up over vectorization restricted to the local assembly kernels.
IXPUG Webinar Series
Vectorization,algorithms,OpenMP,Xeon