Resources

IXPUG Annual Conference 2025 Apr 25, 2025

IXPUG Annual Conference 2025 Welcome Talk & TACC Site Update presented by Lars Koesterke, Research Associate, HPC Performance & Architectures, TACC on April 15, 2025.

Keyword(s): IXPUG Annual Conference 2025,Stampede3,Frontera,National Science Foundation,open science,Icelake,Skylake,Sapphire Rapids,Ponte Vecchio

Author(s): Lars Koesterke

Video(s): Presentation recording

Read more | |

Keynote: The Evolution of Developer Software

IXPUG Annual Conference 2025 Apr 25, 2025

IXPUG Annual Conference 2025 Keynote: The Evolution of Developer Software Presenter: Sanjiv M. Shah, Vice President in the Data Center and AI Group and General Manager of Developer Software Engineering, Intel Corporation

Keyword(s): oneAPI,CPU,GPU,SYCL,UXL,LLVM,OpenMP,Sanitizers,IXPUG Annual Conference 2025

Author(s): Sanjiv Shah

Video(s): Presentation recording

Read more | |

Joint Matrix: A Unified SYCL Extension for Matrix Hardware Programming

IXPUG Annual Conference 2025 Apr 25, 2025

IXPUG Annual Conference 2025 technical talk "Joint Matrix: A Unified SYCL Extension for Matrix Hardware Programming" presented by Dounia Khaldi, Intel Corporation on April 15, 2025. Abstract: Joint matrix is a new SYCL extension for matrix hardware programming. It unifies targets like Intel Advanced Matrix Extensions (Intel AMX), Intel Xe Matrix Extensions (Intel XMX), NVIDIA* Tensor Cores, AMD* Matrix Cores, etc. In general, ML frameworks like Tensorflow and libraries like oneAPI Deep Neural Network Library (oneDNN) are capable of heavily utilizing matrix hardware acceleration, and are the answer for many types of users and applications who want high performance from such hardware. However, for users who want to build their own neural network applications, these libraries and frameworks become too high-level, because users cannot do custom optimizations, and too heavyweight, because the size of libraries is large. Moreover, new operations are often introduced in the machine learning domain for which frameworks and libraries do not provide timely and performant solutions. For such cases, APIs are needed to write custom workload-specific optimizations and this is where joint matrix can help. Joint matrix has a lower level of abstraction than these frameworks and libraries, enabling it to provide performance, productivity, and fusion capabilities but, at the same time, offers portability by using one code to target different matrix hardware. In this talk, we present (1) the main APIs introduced as part of SYCL joint matrix extension, (2) tuning techniques to fully utilize Intel hardware using SYCL joint matrix, and (3) the application and validation of this language feature and tuning techniques using the GEMM benchmark and the ability to fuse kernels such as GEMM and GELU.

Keyword(s): IXPUG Annual Conference 2025,SYCL,Matrix/Tensor programming,Intel XMX

Author(s): Dounia Khaldi

Video(s): Presentation recording

IXPUG Annual Conference 2025 Apr 25, 2025

IXPUG Annual Conference 2025 technical talk "OpenMP in oneAPI: Empowering Scientific Computing on Intel Platforms, From Laptop to Aurora Exascale Supercomputer" presented by Jeongnim Kim, Intel Corporation and Ye Luo, Argonne Leadership Computing Facility, Argonne National Laboratory on April 15, 2025. Co-Authors: Patrick Steinbrecher and Xinmin Tian, Intel Corporation.

Keyword(s): IXPUG Annual Conference 2025,OpenMP,Shared-memory programming,oneAPI,Device offloads,Scientific HPC applications,OpenMP interoperability,Portability, performance and productivity,QMCPACK

Author(s): Jeongnim Kim, Ye Luo, Patrick Steinbrecher, Xinmin Tian

Video(s): Presentation recording

Read more | |

Cornelis Networking Solution Deep Dive

IXPUG Annual Conference 2025 Apr 25, 2025

IXPUG Annual Conference 2025 technical talk "Cornelis Networking Solution Deep Dive" presented by Matt Williams, Field CTO, Cornelis Networks on April 15, 2025.

Keyword(s): High Performance Networking,AI Networking,AI/HPC network congestion,Low Latency HPC network,Scalable RDMA,Ultra Ethernet Consortium,RoCE for AI/HPC,InfiniBand for AI/HPC,Lossless Ethernet,GPU Clusters for AI

Author(s): Matt Williams

Video(s): Presentation recording

Read more | |

SYCL Graph: Reducing Kernel Launch Overhead for Intel GPUs

IXPUG Annual Conference 2025 Apr 25, 2025

IXPUG Annual Conference 2025 technical talk "SYCL Graph: Reducing Kernel Launch Overhead for Intel GPUs" presented by Pablo Reble, Intel Corporation on April 15, 2025. Abstract: SYCL is a standard open-source C++ based programming model for writing vendor independent heterogenous parallel programs. SYCL Graph is an oneAPI vendor extension that enables separating definition and execution of a command graph in an explicit way. This technology can significantly improve host overhead for launching GPU kernels and therefore improve application performance. In this talk we'll present recent developments of our extension focusing on the Intel OneAPI DPC++ Level Zero backend targeting Intel Battlemage and Ponte Vecchio GPU. We'll discuss advantages regarding kernel launch overhead and present improvements for micro benchmarks as well as real world applications e.g. GROMACS.

Keyword(s): IXPUG Annual Conference 2025,oneAPI,SYCL,SYCL Graph,Level Zero,Battlemage,Ponte Vecchio,PVC

Author(s): Pablo Reble

Video(s): Presentation recording

IXPUG Annual Conference 2025 Apr 25, 2025

IXPUG Annual Conference 2025 technical talk "Optimizing Performance in Parallel Discrete Event Simulations through Profile-Guided Partitioning" presented by Sunil Reddy Maram, University of Massachusetts Dartmouth on April 15, 2025. Abstract: The efficiency of large-scale Discrete Event Simulations (DES) can be significantly enhanced by leveraging parallel processing across multiple computing nodes. However, high communication latencies in distributed environments often hinder the expected performance improvements. This paper introduces a profile-driven approach to partitioning simulation models, aimed at minimizing inter-process communication while balancing computational loads. By profiling the messaging characteristics of various real-world simulation models in a sequential setting, we developed a partitioning strategy that optimizes the distribution of simulation objects across processors. Our experimental results demonstrate that this Profile Guided Partitioning technique can yield substantial performance gains, with observed speedups of up to sixfold in runtime during concurrent execution. This study not only contributes to the understanding of effective simulation model partitioning but also highlights the potential of minimizing network traffic to enhance the overall efficiency of Parallel Discrete Event Simulations.

Keyword(s): IXPUG Annual Conference 2025,Parallel Processing,Distributed Computing,Performance Optimization,Profile-driven Partitioning

Author(s): Sunil Reddy Maram

Video(s): Presentation recording

Read more | |

Scaling Molecular Dynamics Simulations on Aurora with NAMD

IXPUG Annual Conference 2025 Apr 25, 2025

IXPUG Annual Conference 2025 technical talk "Scaling Molecular Dynamics Simulations on Aurora with NAMD" presented by David Hardy, University of Illinois at Urbana-Champaign on April 15, 2025. Co-Authors: Eric Bohm, University of Illinois at Urbana-Champaign; Ke Yue, Intel Corporation; Wei Jiang, Argonne National Laboratory. Abstract: Molecular dynamics simulations serve as a bridge between structural data and biological function, offering an atomic-scale view of the mechanistic foundations of life. The parallel molecular dynamics application NAMD is capable of scaling simulations to tens of thousands of CPU cores and thousands of GPUs, providing a high-performance "computational microscope" to biomedical researchers that can leverage the capabilities of the Aurora supercomputer. The NAMD team in collaboration with Intel has been developing a oneAPI/SYCL code path to fully exploit the Intel Ponte Vecchio GPUs that power Aurora. In this presentation, we will review the past NAMD SYCL development efforts, show NAMD scaling benchmarks on Aurora, and discuss our ongoing work to improve single- and multi-node performance.

Keyword(s): IXPUG Annual Conference 2025,molecular dynamics,high performance computing,GPU computing,Intel oneAPI/SYCL,Intel Ponte Vecchio GPU,Aurora supercomputer

Author(s): David Hardy, Eric Bohm, Ke Yue, Wei Jiang

Video(s): Presentation recording

Read more | |

Leveraging HIP on Intel PVC

IXPUG Annual Conference 2025 Apr 25, 2025

IXPUG Annual Conference 2025 technical talk "Leveraging HIP on Intel PVC" presented by Brice Videau, Argonne Leadership Computing Facility, Argonne National Laboratory on April 15, 2025.

Keyword(s): IXPUG Annual Conference 2025,HIP,PVC,Aurora,OpenCL,Level-Zero

Author(s): Brice Videau

Video(s): Presentation recording

Read more | |

Multi-Scale Light-Matter Dynamics in Quantum Materials on Aurora PVC

IXPUG Annual Conference 2025 Apr 25, 2025

IXPUG Annual Conference 2025 technical talk "Multi-Scale Light-Matter Dynamics in Quantum Materials on Aurora PVC" presented by Nariman Piroozan, Intel Corporation April 15, 2025. Co-Authors: Taufeq Mohammed Razakh, University of Southern California; Thomas Linker, Stanford University; Ye Luo, Argonne National Laboratory; Ken-ichi Nomura and Aiichiro Nakano, University of Southern California. Abstract: Light-matter dynamics in topological quantum materials enables ultralow-power, ultrafast devices. A challenge is simulating multiple field and particle equations for light, electrons, and atoms over vast spatiotemporal scales on Exaflop/s computers with increased heterogeneity and low-precision focus. We present a paradigm shift that solves the multiscale/multiphysics/heterogeneity challenge harnessing hardware heterogeneity and low-precision arithmetic. Divide-conquer recombine algorithms divide the problem into not only spatial but also physical subproblems of small dynamic ranges and minimal mutual information, which are mapped onto best-characteristics matching hardware units, while metamodel-space algebra minimizes communication and precision requirements.

Keyword(s): IXPUG Annual Conference 2025,Quantum Molecular Dynamics,High Performance Computing,Multiscale light-matter dynamics

Author(s): Nariman Piroozan

Video(s): Presentation recording

Read more | |

Search ResultShowing 1 - 10 of 537 Results