IXPUG Workshop at HPC Asia 2024

IXPUG Workshop at HPC Asia 2024

Conference Dates: January 25-27, 2024

Workshop Date: January 25, 2024 from 9:00 a.m. – 12:30 p.m. local Nagoya time (GMT+9)

Location: HPC Asia 2024 — The International Conference on HPC in Asia-Pacific Region, Nagoya, Japan

Registration: To attend, register via HPC Asia 2024

Event Description: The Intel eXtreme Performance Users Group (IXPUG) is an active community-led forum for sharing industry best practices, techniques, tools, etc., for maximizing efficiency on Intel platforms and products. IXPUG Workshop at HPC Asia 2024 is an open workshop on high-performance computing applications, systems, and architecture with Intel technologies. This is a half-day workshop with invited talks and contributed papers. The workshop aims to bring together software developers and technology experts to share challenges, experiences, and best-practice methods for the optimization of HPC, Machine Learning, and Data Analytics workloads on Intel® Xeon® processors, Intel® Xeon® CPU Max Series, Intel® Data Center GPU Max Series, Intel® Gaudi®, Intel® FPGAs, and any related hardware/software platforms. The workshop will cover application performance, and scalability challenges at all levels — from intra-node performance up to large-scale compute systems. Any research aspect related to Intel HPC and AI products is welcome to be presented in this workshop.

Workshop Agenda: All times are shown in JST/ Japanese Standard Time, GMT+9

Jan. 25:

9:00-9:10: Opening Remarks
Toshihiro Hanawa, Workshop Chair (The University of Tokyo)

Session 1:

9:10-9:40 Keynote: Numerical Weather Prediction Activities in Japan Meteorological Agency
Presenter: Junichi Ishida (Japan Meteorological Agency)
Abstract: The Japan Meteorological Agency (JMA) is responsible for the accurate provision of disaster prevention weather information. JMA uses a numerical weather prediction to produce information on the atmosphere and oceans. In light of recent disasters caused by typhoons and heavy rains, JMA is developing better NWP system for the better forecast. The use of HPC is important in order to release information as quickly as possible to contribute to disaster prevention, as the large volume of calculations required for NWP. In today's presentation, I will explain the JMA's efforts to utilize HPC in NWP.

9:40-10:00 Performance Evaluation of the Fourth-Generation Xeon with Different Memory Characteristics
Presenters: Keiichiro Fukazawa, Riki Takahashi (Kyoto University)
Abstract: At the Supercomputer System of Academic Center for Computing and Media Studies Kyoto University, the fourth-generation Xeon (code-named Sapphire Rapids) is employed. The system consists of two subsystems—one equipped solely with high-bandwidth memory, HBM2e, and the other with a large DDR5 memory capacity. Using benchmark applications, a performance evaluation of systems with each type of memory was conducted. Additionally, the study employed a real application, the electromagnetic fluid code, to investigate how application performance varies based on differences in memory characteristics. The results confirm the performance improvement due to the high bandwidth of HBM2e. However, it was also observed that the efficiency is lower when using HBM2e, and the effects of cache memory optimization are relatively minimal.

10:00-10:20 MPI-Adapter2: An Automatic ABI Translation Library Builder for MPI Application Binary Portability
Presenters: Shinji Sumimoto, Toshihiro Hanawa (The University of Tokyo), Kengo Nakajima (The University of Tokyo/RIKEN CCS)
Abstract: This paper proposes an automatic MPI ABI (Application Binary Interface) translation library builder named MPI-Adapter2. The container-based job environment is becoming widespread in computer centers. However, when a user uses the container image in another computer center, the container with MPI binary may not work because of the difference in the ABI of MPI libraries. The MPI-Adapter2 enables to building of MPI ABI translation libraries automatically from MPI libraries. MPI-Adapter2 can build MPI ABI translation libraries not only between different MPI implementations, such as Open MPI, MPICH, and Intel MPI but also between different versions of MPI implementation. We have implemented and evaluated MPI-Adapter2 among several versions of Intel MPI, MPICH, MVAPICH, and Open MPI using NAS parallel benchmarks and pHEAT-3D, and found that MPI-Adapter2 worked fine except for Open MPI ver. 4 binary on Open MPI ver. 2 on IS of NAS parallel benchmarks, because of the difference in MPI object size. We also evaluated the pHEAT-3D binary compiled by Open MPI ver.5 using MPI-Adapter2 up to 1024 processes with 128 nodes. The performance overhead between MPI-Adapter2 and Intel native evaluation was 1.3%.

(10:20-10:40 Coffee break)

Session 2:

10:40-11:10 Using Intel oneAPI for Multi-hybrid Acceleration Programming with GPU and FPGA Coupling
Presenters: Wentao Liang, Norihisa Fujita, Ryohei Kobayashi, Taisuke Boku (University of Tsukuba)
Abstract: Intel oneAPI is a programming framework that accepts various accelerators such as GPUs, FPGAs, and multi-core CPUs, with a focus on HPC applications. Users can apply their code written in a single language, DPC++, to this heterogeneous programming environment. However, in practice, it is not easy to apply to different accelerators, especially for non-Intel devices such as NVIDIA and AMD GPUs. We have successfully constructed a oneAPI environment set to utilize the single DPC++ programming to handle true multi-hetero acceleration including NVIDIA GPU and Intel FPGA simultaneously. In this paper, we will show how this is done and what kind of applications can be targeted.

11:10-11:30 Optimize Efficiency of Utilizing Systems by Dynamic Core Binding
Presenters: Masatoshi Kawai, Tetsuya Hoshino (Nagoya University), Akihiro Ida (JAMSTEC), Toshihiro Hanawa (The University of Tokyo)
Abstract: Load balancing at both the process and thread levels is imperative for minimizing application computation time in the context of MPI/OpenMP hybrid parallelization. This necessity arises from the constraint that, within a typical hybrid parallel environment, an identical number of cores is bound to each process. Dynamic Core Binding, however, adjusts the core binding based on the process's workload, thereby realizing load balancing at the core level. In prior research, we have implemented the DCB library, which has two policies for computation time reduction or energy reduction. In this paper, we show that the two policies provided by the DCB library can be used together to achieve both computation time reduction and power consumption reduction.

11:30-11:50 HPCnix: Make HPC Apps More Easier Like Shell Script
Presenters: Minoru Kanatsu (Toyota), Hiroshi Yamada (Tokyo University of Agriculture and Technology)
Abstract: In the area of high-performance computing (HPC), it is common to extract extreme computing performance by using an extremely optimized framework without even an common OS APIs and frameworks for personal desktop. However, this makes the development cost higher than normal application development, and also makes it difficult for beginners. This study shows that it is possible to write HPC applications using semantics like the shell command pipeline in Unix, and proposes a simple application framework for beginners in HPC, called HPCnix.

11:50-12:20 Invited Talk: Application on the Argonne Aurora Exacale System
Presenter: Scott Parker (Argonne National Laboratory)
Abstract: Argonne National Laboratories Aurora Exascale system is in the process of being deployed and applications are beginning to make early use of the system. This talk will present an overview of the Aurora system, which is based on Intel's Data Center Max GPUs and CPUs, and will also discuss the multi-year process of preparing Argonne's Early Science Application (ESP) and Exascale Computing Project (ECP) application to run on the system. Early results and performance of ESP and ECP application running on Aurora will be presented.

12:20-12:30 Closing


Paper Topics of Interest: 

  • Applications of AI to HPC (Machine Learning / Deep Learning)
  • GPU applications and system software
  • Application porting and performance optimization
  • Vectorization, memory, communications, thread, and process management
  • Multi-node application experiences
  • Programming models, algorithms, and methods
  • Software environment and tools
  • Benchmarking and profiling tools
  • Visualization development
  • FPGA applications and system software

Paper Submission: Paper submissions are welcomed via EasyChair. All papers must be original and not simultaneously submitted to another journal or conference. The following paper categories are welcome:

  • Regular papers up to 18 pages with single column in PDF format including figures and references
  • Short papers up to 10 pages with single column in PDF format including figures and references
  • The paper format is described in the “Paper Submission” section of HPCAsia2024 at: https://sighpc.ipsj.or.jp/HPCAsia2024/cfp.html

All accepted papers will be included in ACM Digital Library as a part of the HPC Asia 2024 Workshop Proceedings. For HPC Asia 2024, in order for papers accepted for the workshops to be published in the ACM Workshop Proceedings, the "paper type" of each paper is expected to be "abstract", "full paper", or "short paper". All paper types accepted by ACM are listed in the PDF file: https://www.acm.org/binaries/content/assets/publications/taps/papertypes-csvfields-current.pdf

 Important Dates:

  • Paper due: November 20, 2023 (AoE)
  • Notification of Acceptance: December 05, 2023 (AoE)
  • Camera-ready papers due: December 12, 2023 (AoE)

Organizing Committee: Chair: Toshihiro Hanawa (The University of Tokyo)

Program Committee: 

  • Aksel Alpay (Heidelberg University)
  • Glenn Brook (Cornelis Networks)
  • Steffen Christgau (Zuse Institute Berlin)
  • Clay Hughes (Sandia National Laboratories)
  • Nalini Kumar (Intel Corporation)
  • James Lin (Shanghai Jiao Tong University)
  • Hatem Ltaief (King Abdullah University of Science & Technology)
  • David Martin (Argonne National Laboratory)
  • Christopher Mauney (Los Alamos National Laboratory)
  • Amit Ruhela (Texas Advanced Computing Center (TACC))

General questions should be sent to This email address is being protected from spambots. You need JavaScript enabled to view it.