ISC 2024 IXPUG Workshop

 

ISC 2023 IXPUG Workshop

Third workshop on Communication, I/O, and Storage at Scale on Next-Generation Platforms – Scalable Infrastructures

ISC 2024 IXPUG Workshop


Workshop Date/Time: May 16, 2024 9:00 AM to 1:00 PM

Location: Hall Y7 - 2nd Floor, in-person at ISC 2024, Hamburg, Germany

Agenda:

All times are shown in CEST / Hamburg Time, UTC+2. Event details are subject to change. Register at: https://www.isc-hpc.com/registration-2024.html The workshop is held in conjunction with ISC 2024, Hamburg, Germany. To attend the IXPUG Workshop, you must register for the ISC 2024 Workshop Pass.

09:00–09:10 a.m. Welcome and Introduction to IXPUG
Amit Ruhela (Texas Advanced Computing Center (TACC))

Session 1 | Chair: Amit Ruhela (Texas Advanced Computing Center (TACC))

09:10-09:40 a.m. Optimizing Communications and I/O on Aurora for Application Performance
Authors: Kalyan Kumaran, Kevin Harms (Argonne Leadership Computing Facility, Argonne National Laboratory)
Abstract: The Aurora supercomputer at the Argonne Leadership Computing Facility consists of 10,624 nodes, each with two Intel Data Center CPU Max Series CPUs and four Intel Data Center GPU Max Series GPUs. Within a node, Xe Link and PCIe support coherent memory access to all processors. HPE's Slingshot 11 provides adaptive high-bandwidth interconnect between nodes and to the DAOS-based storage sub-system. This talk will discuss experiences in optimizing applications using XeLinks and eight NICs per compute node, as well as efficiently utilizing DAOS storage. We will examine a few example cases taken from work done during the Aurora Early Science Program, the Exascale Computing Project, and Non-Recurring Engineering work.

09:40-10:10 a.m : Talk 2

10:10-10:35 a.m. High Performance Fabric Support in DAOS
Authors: Michael Hennecke, Alexander Oganezov, Jerome Soumagne, John Carrier, and Joseph Moore (Intel Corporation)
Abstract: The Distributed Asynchronous Object Storage (DAOS) is an open source scale-out storage system that is designed from the ground up to support Storage Class Memory (SCM) and NVMe storage in user space. DAOS can run over any TCP network, but can also take advantage of high performance fabrics like InfiniBand, Slingshot, or Omni-Path. This paper describes the networking architecture of DAOS, and discusses scaling and performance aspects of running DAOS over those high performance fabrics.

10:35-11:00 a.m. Performance Evaluation and Optimization of Seismic Imaging Applications on HBM-Enabled CPUs
Authors: Huda Ibeid (Intel Corporation), Pavel Plotnitskii, Kadir Akbudak, Hatem Ltaief, and David Keyes (King Abdullah University of Science & Technology (KAUST))
Abstract: The HBM memory is designed to provide both high bandwidth and low power consumption. In this talk, we will evaluate the improvements brought by the adoption of HBM and explore strategies to maximize the advantages of HBM-enabled CPUs for modeling the 3D acoustic wave equation in the context of seismic modeling. Given the inherent memory-bound nature of the stencil operator in the wave equation, there is significant data movement across the memory subsystem, which could negatively impact the throughput. We exploit the HBM high bandwidth through spatial and temporal data reuse, thus harnessing the performance potential provided by HBM.

(11:00-11:30 a.m. Coffee Break)

Session 2 | Chair: David Martin (Argonne Leadership Computing Facility, Argonne National Laboratory)

11:30 a.m.-12:15 p.m. Intel Keynote
Presenter: Philippe (Phil) Thierry (Intel Corporation)

12:15-12:35 p.m. HPC Experiences with Intel GPU Max for Deep Learning at Scale
Authors: Nicholas Charron (University of Gothenburg), Steffen Christgau (Zuse Institute Berlin (ZIB))

12:35-12:55 p.m. Investigating the Performance of LLVM-based Intel Fortran Compiler (ifx)
Presenter: Dhani Ruhela (Westwood High School)
Abstract: LLVM is a free, open-source compiler framework for programmatically generating machine-native code. Developers nowadays are increasingly embracing LLVM to develop new languages or modify existing ones. LLVM-based compilers enable shorter build of compilers that are portable across various platforms, easy to maintain, and extensively optimized for the target systems. Intel oneAPI moved to an LLVM infrastructure with C (icx) and C++ (icpx) compilers in the 2021.3 release and a Fortran compiler (ifx) in the 2023.0 release. According to Intel, the LLVM-based compilers are packed with advanced language features and deliver the absolute best performance for various applications on Intel architectures. The LLVM-based Intel compilers have been extensively tuned for the 4th Gen Intel® Xeon Scalable processors (code-named Sapphire Rapids), Intel® Xeon CPU Max Series (code-named Sapphire Rapids HBM) and the Intel® Data Center GPU Max Series (code-named Ponte Vecchio). In this work, I aim to explore the features and performance of LLVM-based compilers compared with legacy compilers on three machine architectures, i.e. Sapphire Rapids with DDR5, Sapphire Rapids with HBM, and Intel Cascade Lake. To my best belief, this is the first extensive study that uncovers the potential of LLVM-based Intel compilers with eight scientific representative codes and demonstrates up to 17% performance improvements with Inter Fortran compiler (ifx) on Intel architectures.

12:55-1:00 p.m. Workshop Closing Remarks
David Martin (Argonne Leadership Computing Facility, Argonne National Laboratory)

Event Description:

Next-generation HPC platforms have to deal with increasing heterogeneity in their subsystems. These subsystems include internal high-speed fabrics for inter-node communication; storage system integrated with programmable data processing units (DPUs) and infrastructure processing units (IPUs) to support software-defined networks; traditional storage infrastructures with global parallel POSIX-based filesystems complemented with scalable object stores; and heterogeneous compute nodes configured with a diverse spectrum of CPUs and accelerators (e.g., GPU, FPGA, AI processors) having complex intra-node communication.

The workshop intends to attract system architects, code developers, research scientists, system providers, and industry luminaries who are interested in learning about the interplay of next-generation hardware and software solutions for communication, I/O, and storage subsystems tied together to support HPC and data analytics at the systems level, and how to use them effectively. The workshop will provide the opportunity to assess technology roadmaps to support AI and HPC at scale, sharing users’ experiences with early-product releases and providing feedback to technology experts. The overall goal is to make the ISC community aware of the emerging complexity and heterogeneity of upcoming communication, I/O, and storage subsystems as part of next-generation system architectures and inspect how these components contribute to scalability in both AI and HPC workloads.

Workshop Format:

The workshop will have a keynote, full (30 min) talks and lightning talks (10-15 min). While in-person presentations are preferred, pre-recorded videos will be allowed as presentations in exceptional cases.

Call for Submissions:

The submission process will close on March 15, 2024 AoE (updated!). All submitters should provide content that represents an Extended Abstract, max. 6-12 pages in LNCS format via the IXPUG EasyChair https://easychair.org/cfp/ISC-2024-IXPUG-Workshop. Notifications will be sent to submitters by March 22, 2024 AoE. The page limit is 12 pages for each paper with 2 possible extra pages after the review to address the reviewer's comments. The page limit includes bibliography and appendices.

Topics of Interest are (but not limited to):

  • Holistic view on performance of next-generation platforms (with emphasis on communication, I/O, and storage at scale)
  • Application-driven performance analysis with various HPC fabrics
  • Software-defined networks in HPC environments
  • Experiences with emerging scalable storage concepts, e.g., object stores using next-generation HPC fabrics
  • Performance tuning on heterogeneous platforms from multiple vendors including impact of I/O and storage
  • Performance and portability using network programmable devices (DPU, IPU)
  • Best practice solutions for application programming with complex communication, I/O, and storage at scale

Keywords:
high-performance fabrics, data and infrastructure processing units, scalable object stores as HPC storage subsystems, heterogeneous data processing, holistic system view on scalable HPC infrastructures

Review Process:
All submissions within the scope of the workshop will be peer-reviewed and will need to demonstrate the high quality of the results, originality and new insights, technical strength, and correctness. We apply a standard single-blind review process, i.e., the authors will be known to reviewers. The assignment of reviewers from the Program Committee will avoid conflicts of interest.

Important Dates:

  • Deadline for submissions: March 15, 2024 (updated!)
  • Acceptance notification: March 22, 2024
  • Camera ready presentation: May 10, 2024
  • Workshop date: May 16, 2024

Organizers:

  • Hatem Ltaief, King Abdullah University of Science & Technology
  • David Martin, Argonne Leadership Computing Facility
  • Amit Ruhela, Texas Advanced Computing Center (TACC)

Program Committee:

  • Aksel Alpay, Heidelberg University
  • Glenn Brook, Cornelis Networks
  • Steffen Christgau, Zuse Institute Berlin
  • Toshihiro Hanawa, The University of Tokyo
  • Clayton Hughes, Sandia National Laboratories
  • Nalini Kumar, Intel Corporation
  • James Lin, Shanghai Jiao Tong University
  • Hatem Ltaief, King Abdullah University of Science & Technology
  • David Martin, Argonne National Laboratory
  • Christopher Mauney, Los Alamos National Laboratory
  • Amit Ruhela, Texas Advanced Computing Center (TACC)

Contact:

Please contact This email address is being protected from spambots. You need JavaScript enabled to view it. with any general questions.