"Application Performance on Intel Xeon Phi –
Being Prepared for KNL and Beyond"
Location: Frankfurt am Main, Germany
Date: Thursday, June 23, 2016, 8:30am-6:00pm
Venue: Marriott Frankfurt Hotel
The workshop will bring together software developers and technology experts to share challenges, experiences and best-practice methods for the optimization of HPC workloads on the Intel Xeon Phi. The workshop will cover application performance and scalability challenges at all levels - from single processor, to moderately-scaled cluster, up to large HPC configurations with many Xeon Phi devices.
The keynote will present recent information about the KNL processor. The submitted talks cover optimization and scalability topics in real-world HPC applications, e.g. data layouts and code restructuring for efficient SIMD operation, work distribution and thread management. Aspects related to KNL features (e.g. high-bandwidth memory) are of particular interest. The usability of tools for development, debugging and performance analysis will be covered.
A planned panel session provides an opportunity to discuss optimization strategies for Phi and to provide feedback to the toolchain developers.
Important Note: The workshop is held in conjunction with the ISC'16 in Frankfurt (Main). For attending the workshop, you have to register for ISC Workshops (registration is open since March 1st). More information is on the ISC'16 Conference site.
|09:00||09:15||Opening||Richard Gerber (President of IXPUG, NERSC)
Welcome - The Intel Xeon Phi User's Group
|09:15||10:00||Keynote I||Avinash Sodani (Intel):
Knights Landing Intel® Xeon Phi™ CPU: Path to Parallelism with General Purpose Programming
|10:00||10:30||Platform Evaluation||A. Gómez-Iglesias, K. Milfeld, J. Cazes, C. Rosales-Fernandez, L. Koesterke and L. Huang (TACC):
A comparative study of application performance and scalability on the Intel Knights Landing processor
|10:30||11:00||D. Doerfler, J. Deslippe, S. Williams, L. Oliker, B. Cook, Th. Kurth, M. Lobet, T. Malas, J.-L. Vay and H. Vincenti (NERSC):
Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor
|11:30||12:00||Applications I||J. Deslippe, F. H. Da Jornada, D. Vigil-Fowler, K. Raman, R. Sasanka, St. G. Louie, N. Wichmann and T. Barnes (NERSC/UCB/Intel/Cray/NREL/LBNL):
Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: a Case Study on the BerkeleyGW Software
|12:00||12:30||Th. Kurth, D. Kalamkar, B. Joo and K. Vaidyanathan (NERSC/LBNL/JLAB/Intel):
Optimizing Dirac Wilson Operator and linear solvers for Intel KNL
|12:30||13:00||B. Cook, P. Maris, M. Shao, N. Wichmann, M. Wagner, J. O’neill, T. Phung and G. Bansal (NERSC/ISU/LBNL/Cray/Intel):
High performance optimizations for nuclear physics code MFDn on KNL
|14:00||14:45||Keynote II||John McCalpin (TACC):
Trends in System Cost and Performance Balances and Implications for Future HPC Systems
|14:45||15:15||General Techniques||O. Krzikalla, F. Wende and M. Höhnerbach (TUD/ZIB/RWTHA):
Dynamic SIMD Vector Lane Scheduling
|15:15||15:45||C.J. Newburn, J. Sukha, I. Sharapov, A. D. Ngyuen and C.-C. Miao (Intel):
Suitability Assessment for Many-Core Targets
|15:45||16:00||Lightning Talks I
||N. Agrawal, P. Edwards, R. Ojha, A. Pandey and P. Pawar (TATA/Intel):
Performance Optimization of OpenFOAM on KNL
|M. Noack (ZIB)
AVX512 vs AVX2 on KNL
|16:30||17:00||Applications II||T. Malas, Th. Kurth and J. Deslippe (LBNL):
Optimization of the matrix-vector products of an IDR Krylov iterative solver for the Intel KNL manycore processor
|17:00||17:30||A. Walden, S. Khan, B. Joo, D. Ranjan and M. Zubair (ODU/JLAB):
Optimizing a Multiple Right-hand Side Dslash Kernel for Intel Xeon Phi
|17:30||17:50||Lightning Talks II||R. Li, St. Gottlieb, C. Detar, D. Tousaint, A. Jha, D. Kalamkar, B. Joo and D. Doerfler (IU/Intel/UU/UA/JLAB/LBL):
Porting the MIMD Lattice Computation (MILC) Code to the Intel Xeon Phi Knights Landing Processor
|M. Lysaght, S. Delaney and G. Civario (ICHEC/Tullow Oil):
Early Performance of Seismic Imaging Kernels on Intel Knights Landing
|M. Lobet, J.-L. Vay, H. Vincenti, A. Bhagatwala, R. Lehe and J. Deslippe (LBL):
PICSAR: a high-performance library for Particle-In-Cell codes optimized for Intel Xeon Phi KNL architectures
|K. Milfeld (TACC):
OpenMP Affinity.. ..On KNL
|17:50||18:00||Closing||Discussion and Final Remarks (IXPUG)|
Organizers and Contacts
Please contact one of the organizers if you have any questions.
- Richard A. Gerber
National Energy Research Scientific Computing Center, Lawrence Berkeley National Lab. (NERSC)
- Kent Milfeld
Texas Advanced Computing Center (TACC)
- Chris J. Newburn
- Thomas Steinke
Zuse Institute Berlin (ZIB)
Call for Papers
Many-core and multicore processor technologies have put more demand on memory and interconnects. These processor technologies also require a more detailed understanding of the interaction between the memory, interconnect and SIMD units for users to design optimal algorithms at scale on a node, as well as at scale on a system level.
The IXPUG workshop is about sharing ideas, implementations, and experiences that will help users take advantage of new technologies such as AVX512 operations, high-bandwidth memory (HBM) and OmniPath. These architectural advances in Vectorization, Memory, and Communications on the Intel Xeon Phi platform will help boost adoption of many-core architecture in HPC as well as other computational spaces.
In the workshop you will experience an open forum with fellow application programmers, Intel Phi architecture designers, and compiler and tool experts.
In addition to the technical paper presentations, the program will include a morning keynote on Intel microprocessors and an afternoon presentation on memory performance, and will conclude with a panel discussion.
IXPUG welcomes paper submissions on innovative work from KNC and KNL users in academia, industry and government labs, describing original discoveries and experiences that will promote and prescribe efficient use of many-core and multicore systems.
Topics of interest are (but not limited to):
- Vectorization: Data layout in cache for efficient SIMD operations, SIMD directives and operations, and 2-core tiling with 2D interconnected mesh
- Memory: Data layout in memory for efficient access (data preconditioning), access latency concerns (prefetch, streams, costs for HBM), partitioning of DDR and HBM for applications (memory policies)
- Communication, including early experiences with OmniPath
- Thread and Process Management: Process and thread affinity issues, SMT (simultaneous multi-threading, in core), balancing processes and threads
- Programming Modells: OpenMP 4.x, hStreams, using MPI 3 on Xeon Phi, hybrid programming (MPI/OpenMP, others)
- Algorithms and Methods: , including scalable and vectorizable algorithms
- Software Environments and Tools
- Benchmarking & Profiling Tools
Paper Format: Papers will be published through the Springer Publishing, using their template (see Information for Authors below). It will also provide consistency for the reviewers in layout, page limit and font size.
- PDF files only from LaTeX or MS Word docs.
- Page limits: minimum 4, maximum 10 pages excluding reference list and acknowledgement.
- Submission URL: EasyChair IXPUG Workshop ISC16. Please select workshop, under topics.
The authors of the best scored papers are invite by the Program Committee for publication in the ISC’16 Workshop volume published within the Springer LNCS series.
Information for Authors
General information for authors can be found on Springers LNCS “Information for Authors of Computer Science Publications”. It includes links to MS Word templates. For LaTeX, a simplified LNCS template and Quick Start can be found in the LNCS Repository on GitHub.
A limited number of presentations will be accepted in the area of late-breaking technology and subjects that may contain less content than a paper (for a lightning talk).
Make sure you check the "presentation" box in the submission form.
- Presentation Format: IXPUG PowerPoint template (with suggestions; specific format not required)
- Page limits: 10 pages for lightning talks; 20 pages for late-breaking talks
- Submission URL: EasyChair IXPUG Workshop ISC16
The authors of the best scored papers are invite by the Program Committee for presentation in the ISC’16 Workshop.
Important Due Dates
|Extended Abstract Submission||15 April 2016 AoE|
|Full Paper Submission--extended||29 April 3 May 2016 AoE|
|Reviews start||4 May 2016|
|Paper Acceptance Notification||13 May 2016 (postponed from 11 May 2016)|
|Presentation Agenda Finalized (presenter notifications)||27 May 2016|
|Workshop Day||23 June 2016|
|Camera Ready papers due||30 June 2016|
Reviewers are expected to make judgment on what was available at the time reviews were assigned (May 3). Subsequent updates to content may or may not be considered by the program committee as part of the selection decision. We encourage authors to exercise the freedom to use the time up until presentation and camera ready copy to provide the highest-quality product.
All submitted papers will be reviewed. We apply a standard single-blind review process, i.e., the authors will be known to reviewers. All submissions within the scope of the workshop will be peer-reviewed and will need to demonstrate quality of the results, originality and new insights, technical strength, and correctness. The submitted papers may not be published in or be in preparation for other conferences, workshops or journals.