Resources

We have collected presentations from IXPUG workshops, annual meetings, and BOF sessions, and made them accessible here to view or download. You may search by event, keyword, science domain or author’s name. The database will be updated as new talks are made available.

  • CategoriesClear All
    • Toggle ImageToggle Image
    • Toggle ImageToggle Image
    • Toggle ImageToggle Image
    • Toggle ImageToggle Image
    • Toggle ImageToggle Image
    • Toggle ImageToggle Image
    • Toggle ImageToggle Image
    • Toggle ImageToggle Image
    • Toggle ImageToggle Image

Search ResultShowing 1 - 10 of 501 Results

IXPUG Webinar Series May 03, 2022

The Intel® Fortran Compiler is built on a long history of generating optimized code that supports industry standards while taking advantage of built-in technology for Intel® Xeon® Scalable processors and Intel® Core™ processors. Staying aligned with Intel's evolving and diverse architectures, the compiler now supports GPUs. This presentation will cover the compiler standards and path forward. There are two versions of this compiler. Both versions integrate seamlessly with popular third-party compilers, development environments, and operating systems. • Intel Fortran Compiler: provides CPU and GPU offload support • Intel Fortran Compiler Classic: provides continuity with existing CPU-focused workflows Features: • Improves development productivity by targeting CPUs and GPUs through single-source code while permitting custom tuning • Supports broad Fortran language standards • Incorporates industry standards support for OpenMP* 4.5, and initial OpenMP 5.0 and 5.1 for GPU offload • Uses well-proven LLVM compiler technology and Intel's history of compiler leadership • Takes advantage of multicore, Single Instruction Multiple Data (SIMD) vectorization and multiprocessor systems with OpenMP, automatic parallelism, and coarrays Speaker: Ron Green is the manager of the Intel Fortran OpenMP and Runtime Library development team. He is a moderator for the Intel Fortran Community Forum and is an Intel Developer Zone “Black Belt”. He has extensive experience as a developer and consultant in HPC for the past 30+ years and has been with Intel’s compiler team for thirteen years. His technical interest area is in parallel application development with a focus on Fortran programming.

Keyword(s): HPC,Fortran,Intel Fortran® Compiler,oneAPI,Intel® oneAPI,OpenMP,IFORT,IFX,Intel® Xeon® Scalable processors,Intel® Core™ processors

Author(s): NA
Video(s): , Intel Fortran Compilers: A Tradition of Trusted Application Performance
Read more | |
IXPUG Webinar Series Mar 23, 2022

This presentation will provide a technical overview of Distributed Asynchronous Object Store (DAOS), a software-defined object store designed from the ground up for massively distributed Non-Volatile Memory (NVM), including Intel® Optane™ DC persistent memory and Intel Optane DC SSDs. This presentation will also introduce the performance and explain main features of DAOS.

Keyword(s): DAOS,Intel® Optane™,Storage,POSIX,PyDAOS

Author(s): Zhen Liang
Video(s): DAOS: Storage Innovations Driven by Intel® Optane™
Read more | |
IXPUG Webinar Series Dec 13, 2021

For shared memory programming of GPGPU systems, users either have to manually run their domain decomposition along available GPUs as well as GPU Tiles. Or leverage implicit scaling mechanisms that transparently scale their offload code across multiple GPU-Tiles. The former approach can be cumbersome, and the latter approach is not always the best performing one. The Intel MPI library can take that burden from users by enabling the user to program only for a single GPU / Tile and leave the distribution to the library. This can make HPC / GPU programming much easier. Therefore, Intel® MPI does not just allow to pin individual MPI ranks to individual GPUs or Tiles, but also allows users to pass GPU memory pointers to the library. Download documentation at https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-reference-linux/top/environment-variable-reference/gpu-support.html

Keyword(s): Intel® MPI Library,GPU,Multi-GPU

Author(s): NA
Video(s): Multi-GPU Programming—Scale-Up and Scale-Out Made Easy, Using the Intel® MPI Library
Read more | |
IXPUG Webinar Series Dec 02, 2021

The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2-3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silico methodologies need to be improved to better select lead compounds that can proceed to later stages of the drug discovery protocol accelerating the entire process. No single methodological approach can achieve the necessary accuracy with required efficiency. Here we describe multiple methodological and supporting infrastructural innovations at scale. Specifically, how we used TACC’s Frontera on > 8000 compute nodes to sustain 144M/hour docking hits, and to screen ?100 Billion drug candidates. These capabilities have been used by the US-DOE National Virtual Biotechnology Laboratory, and represent important progress towards improvement of computational drug discovery, both in terms of size of libraries screened, but also the possibility of generating training data fast enough for very powerful (docking) surrogate models. Shantenu Jha is the Chair of Computation & Data Driven Discovery Department at Brookhaven National Laboratory, and Professor of Computer Engineering at Rutgers University. His research interests are at the intersection of high-performance distributed computing and computational & data science. Shantenu leads the RADICAL-Cybertools project which are a suite of middleware building blocks used to support large-scale science and engineering applications. He was appointed a Rutgers Chancellor's Scholar (2015) and was the recipient of the inaugural Chancellor's Excellence in Research (2016) for his cyberinfrastructure contributions to computational science. He is a recipient of the NSF CAREER Award (2013), the Gordon Bell Award (2020) and several other prizes at SC'xy and ISC’xy, as well as the winner of IEEE SCALE 2018. More details can be found at: http://radical.rutgers.edu/shantenu

Keyword(s): COVID-19,in silico methodologies,TACC,Frontera,US-DOE,biotechnology,computational drug discovery,surrogate models

Author(s): Shantenu Jha
Video(s): IMPECCABLE: A Dream Pipeline for High-Throughput Virtual Screening, or a Pipe Dream?
Read more | |
IXPUG Mid-Year Workshop 2021 Jul 15, 2021

Artificial Intelligence-Enabled Multi-Scale Simulations for COVID-19 Drug Discovery, IXPUG Mid-Year Workshop 2021 keynote presentation by Arvind Ramanathan, Argonne National Lab

Keyword(s): COVID-19,,AI/ML,RNA processing,accelerated simulations,statistical inference,DeepDrive MD,weighted ensemble simulations,heterogenous hardware,Stream-AI-MD,artificial intelligence

Author(s): Arvind Ramanathan
Video(s):
Read more | |
IXPUG Mid-Year Workshop 2021 Jun 21, 2021

SPMD / SIMD on GPUs

Keyword(s): SPMD,SIMD on GPUs

Author(s): Patrick Steinbrecher
Video(s):
Read more | |
IXPUG Mid-Year Workshop 2021 Jun 21, 2021

I have been collecting things which aren't taught in our SYCL (DPC++) book, and you won't get in a full day tutorial on SYCL (at least as taught thus far). In 30 minutes, I will share the collection as it stands today - and I will explain why these matter, and how they underscore the value of SYCL over CUDA because many of the nuances are new to us all because they help SYCL address XPUs (more than 'just GPUs') and they address the needs of being open for multivendor support.

Keyword(s): SYCL,DPC++,nuances

Author(s): James Reinders
Video(s):
Read more | |
IXPUG Mid-Year Workshop 2021 Jun 21, 2021

The high-performance computing (HPC) ecosystem increasingly supports heterogeneous architectures and customization. Field programmable gate arrays (FPGA) are among the options being considered due to their ability to both adapt to individual workloads and serve as prototype vehicles for application-specific accelerators. However, adoption has been limited due to the difficulty in programming these devices. To mitigate this, vendors are introducing frameworks based on embedded domain specific languages (eDSLs), such as SYCL. This work takes the first step in evaluating one of these new DSLs, DPC++, using DOE proxy applications to identify programmability gaps and performance on Intel FPGAs. Initial testing is being done with the MiniAMR application from the Mantevo suite, focusing on the 7-point stencil.

Keyword(s): FPGA Evaluation,oneAPI,Mantevo,SYCL,DPC++

Author(s): Nicholas Miller
Video(s):
Read more | |
IXPUG Mid-Year Workshop 2021 Jun 21, 2021

This lightning talk presents our work on parallelizing the epidemiological simulation model Episim. Episim combines a large-scale person-centric human mobility model with a mechanistic model of infection and a person-centric disease progression model. Movements of persons, including the intervening activities in which they may interact with others, can be directly taken from data. Episim can be used to evaluate different intervention strategies, such as closing educational facilities, reducing other out-of-home activities, wearing masks, or contact tracing. In general, the parallelization of agent-based models is challenging. Especially when the underlying social network is irregular and dynamic, they require frequent communication between the processing elements. In Episim, however, we were able to take advantage of the fact that people are not contagious on the same day they become infected, and therefore immediate health synchronization is not required.

Keyword(s): Epidemics,Agent-based modeling,Parallel programming

Author(s): Steffen Fuerst
Video(s):
Read more | |
IXPUG Mid-Year Workshop 2021 Jun 21, 2021

With discrete Intel GPUs entering the high performance computing landscape, there is an urgent need for production-ready software stacksfor these platforms. In this paper, we report how we prepare the Ginkgo math library for Intel GPUs by developing a kernel backed based on the DPC++ programming environment. We discuss conceptual differences to the CUDA and HIP programming models and describe workflows for simplified code conversion. We benchmark advanced sparse linear algebra routines utilizing the converted kernels to assess the efficiency of the DPC++ backend in the hardware-specific performance bounds, and compare the performance of basic building blocks against routines providing the same functionality that ship with Intel's oneMKL vendor library.

Keyword(s): oneAPI,Intel GPUs,Ginkgo,Math library,Ginkgo Math library,SpMV

Author(s): Terry Cojean
Video(s):
Read more | |