

# THREADING BUILDING BLOCKS FLOW GRAPH:

Expressing and Analyzing Dependencies in your C++ Application

Pablo Reble, Software Engineer

Developer Products Division

Software and Services Group, Intel

# Agenda

- TBB and Flow Graph extensions
  - Composable way to express parallelism
  - Introduction to Flow Graph
- Application example
- Intel® Advisor -- Flow Graph Analyzer
  - New tool for analyzing and designing data flow and dependency graphs



# Threading Building Blocks (TBB)

Celebrating it's 12 year anniversary in 2018!

A widely used C++ template library for parallel programming Parallel algorithms and data structures
Threads and synchronization primitives
Scalable memory allocation and task scheduling

#### **Benefits**

Is a library-only solution that does not depend on special compiler support Supports C++, Windows\*, Linux\*, OS X\*, Android\* and other OSes Commercial support for Intel® Atom<sup>TM</sup>, Core<sup>TM</sup>, Xeon® processors and for Intel® Xeon Phi<sup>TM</sup> coprocessors

Is both a commercial product and an open-source project



Open source (Apache 2.0 since version 2017) Intel® Threading Building Blocks (Intel® TBB)

http://threadingbuildingblocks.org
http://software.intel.com/intel-tbb

# **Threading Building Blocks**

threadingbuildingblocks.org







## Applications often contain multiple levels of parallelism



TBB helps to develop composable levels

# Threading Building Blocks Flow Graph

Enabling developers to exploit parallelism at higher levels

Efficient implementation of dependency graph and data flow algorithms

Design shared memory application





# Heterogeneous support in TBB

TBB as a coordination layer for heterogeneity that provides flexibility, retains optimization opportunities and composes with existing models



FPGAs, integrated and discrete GPUs, co-processors, etc...

Intel® Threading Building Blocks
OpenVX\*
OpenCL\*
COI/SCIF

DirectCompute\*
Vulkan\*

•••

#### TBB as a composability layer for library implementations

One threading engine underneath all CPU-side work

#### TBB flow graph as a coordination layer

- Be the glue that connects hetero HW and SW together
- Expose parallelism between blocks; simplify integration





# Flow graph node types at a glance

#### **Functional**



#### **Buffering**



#### Split / Join



#### Other



As part of Intel Parallel Studio XE Intel® Advisor's

Flow Graph Analyzer (FGA)

Technology Preview in 2018 version

Tool supports analysis and design of parallel Applications using the Threading Building Blocks (TBB) Flow Graph Interface.

Available for Windows\* and Linux

https://software.intel.com/en-us/articles/gettingstarted-with-flow-graph-analyzer





# Real world data flow example











**Output: Display** 

**Computer Vision Algorithms** 

# Example: Advanced driver-assistance systems (ADAS) Application

Framework represents a data flow graph.

- Classic Example for an image processing pipeline
  - Read input from sensor
  - Process Algorithms
  - Display result
- Executes different Algorithm in parallel

Intel<sup>®</sup> Parallel Universe Article Issue 30 (Oct'17): Vasanth Tovinkere, Pablo Reble, Farshad Akhbari, Palanivel Guruvareddiar, Driving Code Performance with Intel<sup>®</sup> Advisor Flow Graph Analyzer



# Designing a data flow graph

#### FGA enables GUI assisted creation of graphs



Can be used for Prototyping (Export graph to C++)

Analyzing Applications using Intel Advisor's

Flow Graph Analyzer

Unique capabilities of mapping trace data to an Applications graph structure

Using build-in feature of TBB to collect traces.

FGA's trace collector extracts graph structure.

TBB 2018 Preview feature: Activate at compile time



Analyzing Applications using Intel Advisor's Flow Graph Analyzer

FGA can map program execution to application's structure (top)

Tree map view (left) shows consumed CPU time and concurrency for nodes.



Analyzing Applications using Intel Advisor's Flow Graph Analyzer

FGA can map trace data to application's structure

Using build-in feature of TBB to collect traces.

Preview feature:
Activate at compile time

 Visualize Nested TBB algorithm\* ►CV1 CPU

computer vision demo.graphml

CV2 FPGA  $\leftarrow$  50 Frames  $\rightarrow$ Thread 0 Thread 1 Thread 4 Thread 5 Thread 6 Thread 7

<sup>\*</sup> TBB instrumentation is work in progress Parallel for instrumentation available as a preview feature in TBB 2018 Update 1

### Performance Results

#### Frame completion rate could me significantly (up to 30%) improved by:

Combination of using Flow Graph as a coordination layer,

And consistent use of Intel® Performance Libraries: Intel® TBB, Intel® Math Kernel Library (Intel®

MKL), Intel® Integrated Performance Primitives





System Configuration: Intel® Core i7 Skylake processor @2.6GHz, Software: Ubuntu\* 16.04, OpenCV\* 3.1.0, Intel® TBB 2018 Update 1, Intel® C++ Compiler Professional Edition for Linux\* OS 2018, Intel® MKL 2018, Intel® Integrated Performance Primitives 2018

# Summary

Flow Graph: Expressing parallelism at a higher level

Efficient implementation of C++ graphs in shared memory

TBB helps to develop composable levels of parallelism

Intel® Advisor Flow Graph Analyzer

Prototype Data Flow and Dependency graphs in different domains

Visualization of timely execution

Capability to map application schedule to its structure

# Tutorial series: Expressing Heterogeneous Parallelism in C++ with Intel® Threading Building Blocks

SC17, PPoPP'17+18, EuroPar 17

Contributors: James Reinders, Rafael Asenjo, Michael Voss, Pablo Reble, Aleksei Fedotov and Jim Cownie



Hands-on material published on GitHub\*:

https://github.com/01org/tbb/tree/tbb\_tutorials/examples/sc17\_hands\_on

# Questions?

# Legal Disclaimer & Optimization Notice

INFORMATION IN THIS DOCUMENT IS PROVIDED "AS IS". NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit <a href="https://www.intel.com/benchmarks">www.intel.com/benchmarks</a>.

Copyright © 2018, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.

#### **Optimization Notice**

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804



