Working Groups

IXPUG Working Groups

The IXPUG working groups provide a virtual means to meet more regularly with other IXPUG members between yearly face-to-face meetings.  The working group was started to foster greater collaboration and knowledge-sharing related to topics of particular interest, and is a great way to get involved in the IXPUG community.

The goals of the working group are as follows:

  1. Direct IXPUG discussions to what is most relevant to the community.
  2. Disseminate results and techniques.
  3. Assist the community with performance debugging/troubleshooting.
  4. Provide a forum for collaboration between IXPUG members and Intel engineers.
  5. Help the community to prepare for upcoming IXPUG events.

How to Join

The working group is open to anybody that wishes to join. Meetings are held on the second Thursday of every month at 08:00 AM PST, using GoToWebinar. To join us, please reference the link included in the calendar invitation and/or click on the title of the upcoming event listed below. 

To receive updates and calendar reminders, subscribe to the working group mailing list HERE.  Please note that you must register for an account on the IXPUG website in order to subscribe.

Scheduling a Meeting

If you are interested in a specific topic and/or want to lead a IXPUG Working Group Session, please let us know by posting in the Working Group Discussion Forum or by contacting the Working Group organizer by This email address is being protected from spambots. You need JavaScript enabled to view it..


Upcoming Meetings

Date Title           Author(s) Description      Presentation
August 9, 2018 Machine Learning at Scale Deborah Bard and Karthik Kashinath, NERSC

Deep Learning has revolutionized the fields of computer vision, speech recognition, robotics and control systems. At NERSC, we have applied deep learning to problems in cosmology and climate science, focusing on areas that require supercomputing resources to solve real scientific challenges. In cosmology, we use deep learning to identify the underlying physical model that produced the matter distribution in the universe, and develop a deep learning-based emulator for cosmological observables that can reduce the need for computationally expensive simulations. In addition, we use feature introspection to examine the physical structures identified by the network as distinguishing between cosmological models. 

 In climate, we apply deep learning to detect and localize extreme weather events such as tropical cyclones, atmospheric rivers and weather fronts in large-scale simulated and observed datasets. We will also discuss the challenges involved in scaling deep learning frameworks to supercomputer scale, and how to obtain optimal performance from supercomputing hardware. 

[Join Meeting]


Previous Meetings

Date Title Author(s) Description Presentation
 June 14, 2018


Using Roofline Analysis to Analyze, Optimize, & Vectorize Iso3DFD with Intel® Advisor 

Kevin O’Leary, Intel Corporation

This presentation will introduce the use of Intel® Advisor to help you enabling vectorization in your application. We will use the Roofline Model in Intel Advisor to see the impact of our optimizations. We will also demonstrate how Intel Advisor can detect wrong memory access patterns or loop carried dependency in your application. The case study we will use is Iso3DFD. This kernel is propagating a wave in a 3D field using finite difference with a 16th order stencil in an isotropic media.


May 10, 2018

High Productivity Languages

Rollin Thomas

Sergey Maidanov
Intel Corporation

This talk will cover challenges of numerical analysis and simulations at scale. The tools such as Python which are often used for prototyping are not designed to scale to large problems. As a result organizations have to have a dedicated team that takes a prototype created by research scientists and deploy it in the production environment.

The new approach is required for addressing both scalability and productivity aspects of applied science that combines two distinct worlds, the best of HPC world and the best of database worlds.


Starting with a brief overview of scalability aspects with respect to modern hardware architecture we will characterize what the problem at scale is, its inherit characteristics and how these map onto software design choices. We will also discuss selected experimental/observational science applications making use of Python at the National Energy Research Scientific Computing Center (NERSC), and what NERSC has done in partnership with the Intel Python Team to help application developers improve performance while retaining scientist/developer productivity.

[Slides 1]

[Slides 2]


April 12, 2018

Topology and Cache Coherence in Knights Landing and Skylake Xeon Processors

John McCalpin

Intel's second-generation Xeon Phi (Knights Landing) and Xeon Scalable Processor ("Skylake Xeon") are both based on a new 2-D mesh architecture with significant changes to the cache coherence protocol. This talk will review some of the most important new features of the coherence protocol (such as "snoop filters", "memory directories", and non-inclusive L3 caches) from a performance analysis perspective. For both of these processor families, the mapping from user-visible information (such as core numbers) to spatial location on the mesh is both undocumented and obscured by low-level renumbering. A methodology is presented that uses microbenchmarks and performance counters to invert this renumbering. This allows the display of spatially relevant performance counter data (such as mesh traffic) in a topologically accurate two-dimensional view. Applying these visualizations to simple benchmark results provides immediate intuitive insights into the flow of data in these systems, and reveals ways in which the new cache coherence protocols modify these flows.



March 8, 2018

Compiler Prefetching on KNL Rakesh Krishaiyer
Intel Corporation

We will cover some of the recent changes in the compiler-based prefetching (for Knights Landing and Skylake) and provide tips on how to tune for performance using compiler prefetching options, pragmas and prefetch intrinsics.



February 8, 2018

Threading Building Blocks (TBB) Flow Graph: Expressing and Analyzing Dependencies in Your C++ Application

Pablo Reble
Intel Corporation

Developing for heterogeneous systems is challenging because applications may be composed of many layers of parallelism and employ a diverse set of programming models or libraries. This session focuses on Flow Graph, an extension to the Threading Building Blocks (TBB) interface that can be used as a coordination layer for heterogeneity that retains optimization opportunities and composes with existing models. This extension assists in expressing complex synchronization and communication patterns and in balancing load between CPUs, GPUs, and FPGAs. 

Because a Flow Graph can express complex interactions, we use Intel Advisor’s Flow Graph Analyzer (FGA), which has been released as a Technology Preview in Parallel Studio XE 2018 to visualize interactions in a graph and map the application structure to performance data. Finally, we validate this approach by presenting use cases of applications using Flow Graph.



January 11, 2018



Vectorization of Inclusive/Exclusive Compilier 19.0 Nikolay Panchenko
Intel Corporation

We propose a new OpenMP syntax to support inclusive and exclusive scan patterns.  In computer science, this pattern is also known as a prefix or cumulative sum.  The proposal defines several new constructs to support inclusive and exclusive scans through OpenMP, defines semantics for these constructs and possible combination of parallelization and vectorization.  In 18.0 Compiler 3 new OMP SIMD experimental features were added: vectorization of loops with breaks, syntax for compress/expand patterns and syntax for histogram pattern.




For more information about previous meetings, please refer to the minutes.