oneAPI Developer Summit 2020

Join us for the inaugural oneAPI Developer Summit focused on oneAPI and Data Parallel C++ for accelerated computing across xPU architectures (CPU, GPU, FPGA, and other accelerators). In this two-day virtual conference, you will hear from industry and academia speakers working on innovative cross-platform architecture solutions developed on oneAPI. Learn from fellow developers and connect with other innovators. Please join us, a self-sustained, vibrant community to support each other using oneAPI and Data Parallel C++.

Register

Event Schedule

Time

Day 1 – November 12, 2020

Day 2 – November 13, 2020

Introduction

08:00 AM – 8:10 AM PDT

Sujata Tibrewala

Sujata Tibrewala

Keynote

08:10 AM – 08:40 AM PDT

“oneAPI Vision for Heterogenous Compute” – Joe Curley
Speaker bio
Watch presentation

“GROMACS” – Erik Lindahl
Speaker bio
Watch presentation

Tech talk 1

08:40 AM -09:20 AM PDT

“Building an open AI & HPC ecosystem” – Andrew Richards
Speaker bio
Watch presentation

“DPC++ and C/C++/Fortran OpenMP Compilers for CPUs and Xe Accelerators” – Xinmin Tian
Speaker bio
Watch presentation

Tech talk 2

09:20 AM -10:00 AM PDT 

“Unifying and Accelerating Reverse Time Migration Programming with oneAPI” – Ahmed Ayyad
Speaker bio
Watch presentation

Lightning talks (20 mins)

“Ginkgo – an open source math library for the DPC++ ecosystem” – Hartwig Anzt
Speaker bio
Watch presentation

“Integrating and benefits of Intel® oneAPI Rendering Toolkit in NCAR VAPOR Climate Visualization App” – John Clyne
Speaker bio
Watch presentation

Break

10:00 AM -10:10 AM PDT

Tech talk 3

10:10 AM -10:50 AM PDT

“Boosting productivity of decision-making with oneAPI-based heterogeneous schedulers on SoCs” – Denisa Constantinescu & Rafael Asenjo
Speaker bio
Watch presentation

“Rooflining Bioinformatics: Boosting Epistasis Detection with Cache-aware Roofline Model” – Aleksandar Illic
Speaker bio
Watch presentation

10:50 AM -11:50 AM PDT

Lightning talks (15 mins)

“ATLAS Charged Particle Seed Finding with DPC++” – Attila Krasznahorkay
Speaker bio
Watch presentation

“Making Banking Secure via Bio Metrics Application Built Using oneAPI Video Processing Library” – Alessandro Faria
Speaker bio
Watch presentation

“Gaining Support for Multiple Device by Migrating a CUDA Code to oneAPI — Experiences with the Intel® DPC++ Compatibility Tool​” – Steffen Christgau & Marius Knaust
Speaker bio
Watch presentation

“Performance Portability with oneAPI: the CMS Physics Reconstruction Software” ​ – Laura Cappelli
Speaker bio
Watch presentation

“OneOligo: Using OneAPI to Accelerate DNA Data Storage” – Raja Appuswamy
Speaker bio
Watch presentation

Lightning talks (15 mins)

“HPC Visual Computing and Analysis SW Development at ExaScale” – Paul Navratil
Speaker bio
Watch presentation

Break

11:50 AM -12:10 PM PDT

Lunch and Fun Activities

Lunch and Fun Activities

Tech talk 4

12:10 PM – 12:50 PM PDT

“Optimizing Computer Aided COVID-19 Drug Design Tools with Intel oneAPI” – Ho Leung Ng
Speaker bio
Watch presentation

“Extending Rice University’s HPCToolkit to Measure and Analyze the Performance of Applications Accelerated with Intel GPUs” – Xiaozhu Meng & Aaron Cherian
Speaker bio
Watch presentation

Tech talk 5

12:50 PM – 01:30 PM PDT

“Adopting oneAPI into the NAMD Molecular Dynamics Application” – David Hardy & Tareq Malas
Speaker bio
Watch presentation

“Accelerating Oil and Gas Applications with SYCL for FPGAs” – Ricardo Menotti
Speaker bio
Watch presentation

Break

01:30 PM – 01:40 PM PDT

Panel

01:40 PM – 02:40 PM PDT

oneAPI Spec & Industry Panel – Moderator: Gergana SlavovaRonan Keryell, Nevin Liber, Penporn Koanantakool, and Andrew Lumsdaine.

oneAPI Tools Panel – Moderator: Henry GabbPaul Petersen, Xiaozhu Meng, Ruyman Reyes and Ramesh Peri

Closing

02:40 PM – 02:50 PM PDT

Sujata Tibrewala

Sujata Tibrewala

Happy hour

02:50 PM – 03:50 PM PDT

Virtual Happy Hour and Networking

Virtual Happy Hour and Networking

oneAPI Vision for Heterogenous Compute

Joe Curley

The need to analyze increasingly complex datasets is driving demand for dedicated workload accelerator chips to beinstalled in data center servers to complement the main server processor. Intel’s solution is oneAPI, a project to deliver a unified software development environment across CPU and accelerator architectures. oneAPI is an Industry Initiative – based on standards, open specification and includes a unified language & libraries that deliver full native code performance. In this Keynote Joe will give a flavor of why Intel started this initiative and the plans going forward. He will recognize the Dev community for the excellent work they have done this past year with oneAPI and give a taste to the audience why you should care about listening to them talk about their “oneAPI experiences” over a period of the next two days.

About the Speaker
Joseph (Joe) Curley serves Intel Corporation as Senior Director, oneAPI Products, Solutions & Ecosystem. His primary responsibilities include supporting the oneAPI industry initiative, product management of Intel’s oneAPI product implementation, and supporting the oneAPI developer ecosystem. Mr. Curley joined Intel Corporation in 2007, and has served in multiple other strategic planning, ecosystem development, and business leadership roles. Prior to joining Intel, Joe worked at Dell, Inc. leading the global workstation product line, the consumer and small business desktop product line, and in a series of engineering roles. He began his career at computer graphics pioneer Tseng Labs.

GROMACS

Erik Lindahl
Erik Lindahl

GROMACS is a state-of-the-art computational tool to understand the molecular mechanisms of the protein molecules in our cells. Eric’s lab leads the development of the GROMACS molecular dynamics toolkit, which is one of the world’s most widely used HPC applications, and which has been tuned to achieve outstanding performance and scaling on everything from laptops to supercomputers and even the Playstation as part of Folding@Home. His team is among the ones that have been able to fully exploit both CPU and GPU hardware
from every vendor available. In this talk he will talk about his Lab’s experience with oneAPI and how it has helped them expand GROMACS’s support of heterogenous hardware.

About the Speaker
Erik Lindahl received a PhD from the KTH Royal Institute of Technology in 2001, and performed postdoctoral research at Groningen University, Stanford University and the Pasteur Institute. He is currently professor of Biophysics at Stockholm University, with a second appointment as professor of Theoretical Biophysics at the Royal Institute of Technology. Lindahl’s research is focused on understanding the molecular mechanisms of membrane proteins, in particular ion channels, through a combination of molecular simulations and experimental work involving cryo-EM and electrophysiology. He has authored some 130 scientific publications and is the recipient of an ERC starting grant. Lindahl heads the international GROMACS molecular simulation project, which is one of the leading scientific codes to exploit parallelism on all levels from accelerators and assembly code to supercomputers and distributed computing. He is co-director of the Swedish e-Science Research Center as well as the Swedish National Bioinformatics Infrastructure, and lead scientist of the BioExcel Center-of-Excellence for Computational Biomolecular Research. His research work has been awarded with the Prix Jeune Chercheur Blaise Pascal, the Sven and Ebba-Christian Högberg prize, and the Wallenberg Consortium North prize. Lindahl is currently the chair of the PRACE Scientific Steering Committee.

Building an Open AI & HPC Ecosystem

Andrew Richards

In the world of AI & HPC programming, to get high performance, developers need accelerators that have high levels of parallelism, but often these uses closed and proprietary programming models.

Developers are demanding open and standards-based solutions that deliver both performance and portability. An open and unified programming model gives developers a way to cost-effectively take advantage of the growing diversity of processor platforms and avoid being locked into a single vendor. This presentation will explore where we are in creating an open ecosystem that uses open standards and how we can get the industry to work together with open standards. Codeplay has been working on these challenges for years having been closely involved in the OpenCL and SYCL Khronos standards for parallel compute. Most recently we have been involved in oneAPI spec definition and expanding DPC++, adding support for Nvidia GPUs.

I will show the vast progress that has been made today, where we’re going next and how you can help us build an open ecosystem for AI and HPC programming.

About the Speaker
CEO and co-founder of Codeplay, Andrew started his career writing video games in the days of 8-bit computers, progressing to become a lead games programmer at Eutechnyx™, where he wrote best-selling titles such as Pete Sampras Tennis and Total Drivin’. Codeplay has been producing compilers for games consoles, special-purpose processors and GPUs since then. As well as being CEO and Founder of Codeplay Software Ltd, Andrew is also the Chair of the Software working group of the HSA Foundation™ and former Chair of the SYCL™ for OpenCL™ sub-group of the Khronos® Group. Andrew graduated from Cambridge University with a degree in Computer Science and Physics.

Unifying and Accelerating Reverse Time Migration Programming with oneAPI

Ahmed Ayyad

As a leading company in the seismic imaging domain, we require our software products to be as highly performant, efficient and sustainable as possible. The state-of-the-art hardware architecture is constantly changing which can be very challenging for software engineers to adapt to these changes, which requires either writing several variants of the software or shifting from one accelerator to another depending on the current norm.

Using oneAPI as a unified programming model for our Reverse Time Migration (RTM) software accelerated our development efforts on multiple hardware platforms. This reduced our need for variant coding as we now use the same code base for any hardware accelerator and gave us the advantage of working on a variety of processor platforms, avoiding being locked into a single vendor.

About the Speaker
Ahmed is a senior HPC software engineer and a member of Brightskies’ parallel programming team. He works on software benchmarking and code optimization/modernization of the company’s state of the art computational science and numerical analysis products. Ahmed’s work involved developing software optimized for different hardware architectures. Ahmed is part of the Brightskies team that is developing one of the earliest substantial products adopting the oneAPI and DPC++ technologies. Previously Ahmed worked at Valeo, developing automotive software solutions to multiple OEMs. Ahmed holds a B.Sc. in electrical engineering from Alexandria University. He has R&D experience in computer vision, machine learning/deep learning and software architecture.

Boosting Productivity of Decision-making with oneAPI-based Heterogeneous Schedulers on SoCs

Denisa Constantinescu & Professor Rafael Asenjo

Our goal is to make it easy and feasible to implement solutions for autonomous decision-making and planning under uncertainty on low-power mobile platforms. We focus on practical applications, such as autonomous driving and service robotics, that must run on SoC mobile platforms. These applications often have real-time execution constraints. The main challenge is to keep the runtime and energy performance in check while making it easy for the users (programmers) to write their code to solve decision-making problems.

Our proposal involves using low-power heterogeneous computing strategies, sparse data structures to fit large real-world decision-making problems on SoCs with scarce memory and computing resources, and oneAPI with DPC++ programming. We compare three heterogeneous scheduling strategies implemented with OpenCLTM + Threading Building Blocks (TBB) versus oneAPI + TBB (oneTBB) to run parallel code on CPU-GPU SoCs and evaluate their performance on a set of benchmarks for planning for mobile robot navigation. The benchmarks compute an optimal navigation plan with Value Iteration (VI) algorithm. VI is a fundamental method to find optimal policies, allowing an intelligent agent to act autonomously in environments where the effects of its actions are not deterministic. The experiments show that the implementations based on DPC++ are up to five times easier to program while incurring only three to eight percent overhead.

This work’s main novelty is solving large-scale Markov Decision Processes on low-power heterogeneous CPU-GPU platforms and demonstrating that we can achieve both performance and productivity when the scheduling strategy is carefully selected. We remark that the oneAPI programming model creates new opportunities to improve performance and efficiency in low-power systems.

About the Speakers
Denisa Constantinescu is a Ph.D. student in Mechatronics and a researcher in the Computer Architecture Department at the University of Malaga. She obtained a Master’s degree in Computer Engineering from the University of Malaga in 2017. She was a Research Visitor at the NUCAR Laboratory (Northeastern University, Boston, USA) in 2018. Her research interests are in parallel computing, robotics, intelligent control systems, optimization, and autonomous decision-making.

Rafael Asenjo is a Professor of Computer Architecture at the University of Malaga, Spain. He obtained a PhD in Telecommunication Engineering in 1997 and was an Associate Professor at the Computer Architecture Department from 2001 to 2017. He was a Visiting Scholar at the University of Illinois in Urbana-Champaign (UIUC) in 1996 and 1997 and Visiting Research Associate in the same University in 1998. He was also a Research Visitor at IBM T.J. Watson in 2008 and at Cray Inc. in 2011. He has been using TBB since 2008 and over the last five years, he has focused on productively exploiting heterogeneous chips leveraging TBB as the orchestrating framework. In 2013 and 2014 he visited UIUC to work on CPU+GPU chips. In 2015 and 2016 he also started to research into CPU+FPGA chips while visiting U. of Bristol. He served as General Chair for ACM PPoPP’16 and as an Organization Committee member as well as a Program Committee member for several HPC related conferences (PPoPP, SC, PACT, IPDPS, HPCA, EuroPar, and SBAC-PAD). His research interests include heterogeneous programming models and architectures, parallelization of irregular codes and energy consumption. He has co-authored the latest book (open access) on Threading Building Blocks (TBB).

ATLAS Charged Particle Seed Finding with DPC++

Attila Krasznahorkay

The ATLAS Experiment is one of the general-purpose particle physics experiments built at the Large Hadron Collider at CERN, in Geneva, Switzerland. Its goal is to study the behavior of elementary particles at the highest energies ever produced in a laboratory, helping us better understand our universe.

The LHC, in what is called the High Luminosity LHC (HL-LHC), is going to increase the intensity of its particle beams many-fold over the next decade to allow us to study the rarest particle interactions possible. This increase in intensity will provide us with great challenges in analyzing the data collected from the ATLAS detector. To be able to process the data collected in that period, we will have to use novel data analysis techniques to cope with the increased complexity of our data.

In this presentation I will show results from a R&D project that implements parts of the charged particle track reconstruction code of ATLAS using oneAPI/DPC++. Allowing us to offload parts of the necessary calculations to different accelerators, providing us with a sizeable processing speed increase for data that we expect to collect during the HL-LHC data taking.

About the Speaker
Attila Krasznahorkay is an Applied Physicist at CERN, having a PhD in Particle Physics. He currently convenes the Accelerator Software Forum of the ATLAS Experiment while working as a core software developer for the experiment, and convenes the Frameworks Working Group of the High Energy Physics Software Foundation (HSF). His work currently focuses on integrating accelerator aided calculations into the ATLAS Experiment’s simulation/reconstruction/analysis software and preparing the ATLAS Experiment’s software for the next data taking period of the Large Hadron Collider.

Making Banking Secure via Bio Metrics Application Built Using oneAPI with oneAPI Video Processing Library

Alessandro Faria

In this talk we will look at oneVPL and how it is used in Certiface technology designed to combat fraud and protect honest people through the ability to differentiate between a live person and a recorded video. Certiface is based to harness heterogeneous computing architecture including CPUs and GPUs from servers to notebooks. The software tools such as oneVPL, computer vision techniques with OpenCV, OpenVINO and Deep Learning technologies based on Intel features such as Threading Building Blocks (TBB), Intel® Integrated Performance Primitives (Intel® IPP) and Intel® Math Kernel Library (Intel® MKL), and high-performance computing. This technology processes millions of faces per second in the cloud, making banking transaction operations in Brazil secure, fast and effective.

About the Speaker
Alessandro was born in Bebedouro, state of Sao Paulo, Brazil. He is a speaker, researcher, founder of OITI TECHNOLOGIES. He has worked with technology since 1984, Linux since 1998, biometrics since 1999, facial biometrics since 2003, computer vision since 2005, and GPU since 2009. He is the inventor of CERTIFACE technology, Ambassador openSUSE Linux in Latin America., member OWASP since 2016, member Mozillians since 2017, official contributor OpenCV library since 2017, open source software contributor, and including maintainers librealsense in openSUSE Linux.

Gaining Support for Multiple Device by Migrating a CUDA Code to oneAPI -- Experiences with the Intel® DPC++ Compatibility Tool

Steffen Christgau & Marius Knaust

Data Parallel C++ (DPC++), the C++- and SYCL-based programming language of choice in the oneAPI programming environment, promises to have a single source code that addresses multiple hardware architectures. However, starting from scratch or rewriting existing application is tedious if not out of question in most cases. The Intel® oneAPI Compatibility Tool addresses this issue by assisting in the migration from CUDA to DPC++. In this talk, we share our experiences with migrating a typical CUDA stencil application code to DPC++ with the help of the tool. The presentation addresses the basic porting process, required manual steps, and issues we faced with the tsunami simulation easyWave. Besides these procedural steps, we point out performance numbers of the hardware devices supported by oneAPI and its evolving ecosystem. This is not limited to devices like Intel CPUs and GPUs but includes promising numbers for CUDA hardware as well. We also demonstrate what needs to be done to execute the migrated, CUDA-originated code on FPGAs.

About the Speakers
Steffen Christgau is a research associate in the Algorithms for Innovative Architectures research group of the Supercomputing Department at the Zuse Institute Berlin (ZIB). His current research interests are the efficient usage of persistent memory for HPC applications as well as their optimization for new hardware platforms with established and new programming environments. He received his Ph.D. as well as his M.Sc. degree in computer science from the University of Potsdam, Germany. While working at the Operating Systems and Distributed Systems group, his research focused on designing and optimizing MPI implementations for an experimental, non-cache-coherent many-core processor. Before he joined ZIB, he also worked in the industry on compiler implementations and robotic systems as well as a lecturer for parallel computing.

Marius Knaust is a research associate in the Algorithms for Innovative Architectures research group of the Supercomputing Department at the Zuse Institute Berlin (ZIB). His current research interest is the application of FPGAs as HPC compute accelerators and improving the high-level synthesis workflow for it. He received his M.Sc. degree in software engineering from the Hasso Plattner Institute in Potsdam, Germany. Prior to joining ZIB, he worked at the division Microrobotics and Control Engineering of the University of Oldenburg and interned with the Mobile Research group of Yahoo Labs in the Silicon Valley.

Performance Portability with oneAPI: the CMS Physics Reconstruction Software

Laura Cappelli

To fully exploit the physics reach of the High-Luminosity Large Hadron Collider, the LHC experiments are planning substantial upgrades of their detector technologies and increases of their data acquisition rates. The higher proton-proton interaction rate, pileup and event processing rate present an unprecedented challenge to the real-time and offline event reconstruction, requiring a processing power which is orders of magnitude larger than today, and exceeds by far the expected increase for conventional CPUs. The Compact Muon Solenoid (CMS) experiment is developing a fully heterogeneous reconstruction software that will be used during the next LHC data taking period, starting in 2022. Its first applications will be the online reconstruction running on a GPU-equipped High-Level Trigger (HLT) farm, and the offline reconstruction running on HPC centres worldwide. These activities will allow the collaboration to gain experience with parallel algorithms and a heterogeneous framework, that will be essential to leverage diverse kinds of accelerators during the HL-LHC data taking. To keep under control the cost of software development, maintenance and validation that this will entail, CMS is evaluating various performance portability frameworks that promise a “write once, run anywhere” approach, building the same code base for different back ends and accelerator types. The speaker will present the ongoing work to port the CMS reconstruction software to the Intel oneAPI platform and compare its performance on different back-ends with that of native code running on the same hardware.

About the Speaker
Laura Cappelli is a computer science student at Alma Mater Studiorum – University of Bologna. Her studies span heterogeneous systems and parallel programming models, and her current research focuses on the use of SYCL from the Khronos Group and oneAPI from Intel for performance portability, and their application to physics reconstruction algorithms. In summer 2020, Laura participated to the CERN OpenLab programme, and is now working in collaboration with physicists from the CMS Experiment at CERN. She will defend her master’s thesis on “Performance portability of physics reconstruction algorithms on heterogeneous systems using the SYCL abstraction programming model” in early 2021.

Optimizing Computer Aided COVID-19 Drug Design Tools with oneAPI

Ho Leung Ng

Computer-aided drug design uses chemistry simulations and calculations to accelerate the discovery of drug candidate molecules. Computational resources and time are cheap relative to the painstaking research in experimental laboratories. Recent advances in computing technologies and computational chemistry have greatly improved the accuracy and scope of computer-aided drug design, furthering the scientific appetite for computational power. Many scientific computing software tools are based on hand-crafted legacy code, written in Fortran for example, and have not been modernized for use on parallel architectures. The greatest challenge to improving software performance is that most developers and users of these tools are trained in the physical sciences with little formal background in software design. Software development tools must be easy to use for this user base.

We describe our efforts and successes using the Intel® oneAPI Math Kernel Library and Threading Building Blocks to improve the performance of computational drug design software. We describe the application of these tools by the Open Source COVID-19 drug discovery consortium and partnering scientists.

About the Speaker
Ho Leung Ng is an Associate Professor of Biochemistry & Biophysics at Kansas State University. His primary research interests include protein crystallography, structure-based drug design for cancer and immunology, computational chemistry, applications of machine learning to computational chemistry and drug design, immuno-oncology, protein kinases, estrogen pharmacology, GPCRs, hormone receptors, malaria, fluorescent proteins, and bio photonics. He is the founder of OpenSourceCOVID19 (www.opensourcecovid19.org).

Adopting oneAPI Into the NAMD Molecular Dynamics Application

David Hardy & Tareq Malas

Molecular dynamics (MD) is an important computational methodology that can provide insight into the structure and function of sub-cellular assemblies of biomolecules at atomic level detail, accessing spatial and temporal resolutions that are not available to purely experimental approaches. As computational power has increased over the past several years, MD techniques have become an invaluable tool for tackling biomedically relevant challenges, such as improving our understanding of the molecular structure of the SARS-CoV-2 virus that causes COVID-19 and guiding computational approaches for screening candidate compounds as potential anti-viral drugs. NAMD is a parallel MD code designed for high-performance simulation of large biomolecular systems. It offers scalable performance on petascale class computers and has now for over ten years been a major application for the NSF supercomputing centers and for various DOE labs. One of the first exascale class computers will be the upcoming Aurora supercomputer based on the latest Intel GPU technology, making the adoption of oneAPI into NAMD critical for performance. In this presentation, we will discuss the ongoing NAMD development efforts of porting its many CUDA kernels to DPC++ with the Intel DPC++ Compatibility Tool and using the Intel® VTune™ Profiler to improve GPU utilization and overall performance of the new DPC++ kernels.

About the Speakers
Dr. David J. Hardy is a Senior Research Programmer at the University of Illinois at Urbana-Champaign. He leads the development of NAMD, an award-winning parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems that, through the use of Charm++ parallel objects, is able to scale to hundreds of thousands of CPU cores and tens of thousands of GPUs. He obtained his PhD in Computer Science in 2006 from the University of Illinois at Urbana-Champaign. His research interests include fast methods for calculating electrostatics, numerical integration methods suitable for Hamiltonian systems, and GPU computing.

Tareq joined Intel after his postdoctoral fellowship in Lawrence Berkeley National Laboratory under the NERSC Exascale Science Applications Program. He obtained his MS and Ph.D. degrees from King Abdullah University of Science and Technology (KAUST), in Saudi Arabia, advised by Prof. David Keyes in the Extreme Computing Research Center. His main areas of research are High Performance Computing in stencil computations and molecular dynamics simulations. He is interested in developing efficient high-performance computing algorithms on contemporary and future architectures for the most demanding applications. He likes to work near the CPU, as he did his Stencil code generation project in performing efficient vectorization in the CPU of the PowerPC 450 processor of the Blue Gene/P supercomputer. He worked on developing novel cache blocking techniques for reducing the data movement in the processor’s memory hierarchy, allowing the use of cache blocks that can efficiency span multiple cache domains. This work was developed in his Girih project for Intel® CPU. He is currently working on the performance optimizations of molecular dynamics simulations.

DPC++ and C/C++/Fortran OpenMP Compilers for CPUs and Xe Accelerators

Xinmin Tian

In the high-performance computing realm, the emerging DPC++/SYCL programming model is getting attention while OpenMP* continues to be the popular parallel programming model to a wide range of HPC/AI accelerators such as Xe GPUs and FPGAs. This presentation includes:

  • A brief overview of the DPC++ and OpenMP offloading Models
  • An overview of Intel’s LLVM compiler technology for DPC++ and OpenMP offloading
  • A deep dive on performance tuning for one AI workload on Intel CPU and Xe GPU

About the Speaker
Xinmin Tian is a Senior Principal Engineer and Compiler Architect at Intel Corporation and serves as Intel’s representative on OpenMP Architecture Review Board (ARB) and OpenMP C/C++ subcommittee chair. He drives OpenMP offloading, vectorization and parallelization compiler technologies for current and future Intel architectures. His current focus is on DPC++ compiler optimizations for oneAPI Toolkits, LLVM-based OpenMP offloading, and tuning HPC/AI application performance on Intel® CPUs and Xe accelerators. He has a Ph.D. in Computer Science, holds 27 U.S. patents, has published over 60 technical papers with over 1200 citations of his work, and has co-authored three books that span his expertise.

Ginkgo - an Open Source Math Library for the DPC++ Ecosystem

Hartwig Anzt

In this talk, we will present the Ginkgo open source math library and its capabilities on Intel® GPU architectures. We will start with reporting our experiences in porting an NVIDIA-focused software stack to Intel’s DPC++ environment and the obstacles we encountered when using automated code conversion. We will then present the functionality Ginkgo currently provides for Intel GPUs, and present initial performance results.

About the Speaker
Hartwig Anzt is a Helmholtz-Young-Investigator Group leader at the Steinbuch Centre for Computing at the Karlsruhe Institute of Technology (KIT). He obtained his PhD in Mathematics at the Karlsruhe Institute of Technology. Afterwards, he joined Jack Dongarra’s Innovative Computing Lab at the University of Tennessee in 2013 until he started his own research group in 2017. He still contributed to the Innovative Computing Lab as a Research Consultant. Hartwig Anzt has a strong background in numerical mathematics, specializes in iterative methods and preconditioning techniques for the next generation hardware architectures. His Helmholtz group on Fixed-point methods for numerics at Exascale (FiNE) is granted funding until 2022. Hartwig Anzt has a long track record of high-quality software development. He is author of the MAGMA-sparse open source software package and managing lead of the Ginkgo numerical linear algebra library. Hartwig Anzt is a co-PI of the PEEKS project and the xSDK project inside the software technology effort of the US Exascale Computing Project (ECP). He is also the technical PI of the multiprecision effort in the xSDK project, a coordinated effort aiming at integrating low-precision functionality into high-accuracy simulation codes.

Integrating and Benefits of oneAPI Rendering Toolkit in NCAR VAPOR Climate Visualization App

John Clyne

VAPOR is the Visualization and Analysis Platform for Ocean, Atmosphere, and Solar Researchers. VAPOR provides an interactive 3D visualization environment that can also produce animations and still frame images. This talk will discuss the recent successful integration of Intel® oneAPI Rendering Toolkit and the benefits in performance, scalability, and high-fidelity Ray Tracing Visualization it provides.

About the Speaker
John Clyne manages the Visualization and Analysis Systems Technologies (VAST) section at the National Center for Atmospheric Research (NCAR) in Boulder Colorado. VAST is involved in numerous activities related to the visualization and analysis of Earth System Science (ESS) data, including development of open source community software, research, E&O, and production visualization services. John is the chief architect of the widely used VAPOR package. His research interests include volume rendering, flow visualization, and strategies for large, time varying data visualization. He holds an M.S. in computer science from the University of Colorado.

Rooflining Bioinformatics: Boosting Epistasis Detection with Cache-aware Roofline Model

Aleksandar Illic

In the first part of this talk, we will introduce the Cache-aware Roofline Model (CARM) and expose its basic principles when modelling the performance upper-bounds of a processor. We will also discuss our recent research contributions in extending the model insightfulness with application-driven CARM, as well as applying the CARM principles to model power consumption and energy-efficiency upper-bounds. In the second part of this talk, we will rely on CARM implementation in Intel® Advisor to showcase its ability to drive the optimization of epistasis detection, an important application in bioinformatics. For both Intel CPU and GPU devices, we will demonstrate how CARM can be used to detect execution bottlenecks and provide useful hints on which type of optimizations to apply in order to fully exploit device capabilities. The guidelines provided by CARM were fundamental to achieve the speedups of more than 20x on Intel® six-core CPU and Gen 9.5 GPU.

About the Speaker
Aleksandar Ilic (PhD’14) is an Assistant Professor at the Instituto Superior Técnico (IST), Universidade de Lisboa, and a Senior Researcher of INESC-ID, Portugal. He has contributed to more than 50 international journal and conference publications and received several awards for his scientific and teaching achievements, including the HiPEAC 2017 Tech Transfer award for integration of Cache-aware Roofline Model in Intel Advisor. His research interests include high-performance and energy-efficient computing and modeling of parallel heterogeneous systems.

OneOligo: Using oneAPI to Accelerate DNA Data Storage

Raja Appuswamy
Raja Appuswamy

In the European Commission-funded Future and Emerging Technologies initiative OligoArchive, we are working on transforming DNA–the biological building block of life–into a digital building block for long-term data archival. One of the key steps in retrieving digital data stored in DNA involves clustering billions of strings with respect to edit distance. The computationally intensive nature of edit distance computation has made this step a critical bottleneck in the DNA data retrieval pipeline. In this talk, we will present project OneOligo—our scalable, hardware-accelerated solution for DNA read clustering. In doing so, we will first provide an overview the DNA data storage pipeline. Then, we will present OneJoin—a string-similarity join algorithm that synergistically combines algorithmic advances in low-distortion embedding with cross-architectural programming ability offered by DPC++, to scale-up clustering across CPUs and GPUs.

About the Speaker
Raja Appuswamy is an Assistant Professor in the Data Science department at EURECOM–a French Grandes Écoles located in the Sophia Antipolis tech-valley of southern France. Previously, he was as a Researcher and Visiting Professor at EPFL, Switzerland, a Visiting Researcher in the Systems and Networking group at Microsoft Research, Cambridge, and as a Software Development Engineer in the Windows 7 team at Microsoft, Redmond. He received his Ph.D in Computer Science from the Vrije Universiteit, Amsterdam, where he worked under the guidance of Prof. Andrew S. Tanenbaum on designing and implementing a new storage stack for the MINIX 3 microkernel operating system. He also holds dual master’s degrees in computer science and Agricultural Engineering from the University of Florida.

HPC Visual Computing and Analysis SW Development at ExaScale

Paul Navratil

Dr. Navratil will discuss the “state of the art” and near-term trends driving HPC application development efforts in preparation for the massive scale of data and computation in the ExaScale era. He will touch up the ongoing efforts of the SOLAR Ray Tracing Consortium’s efforts to utilize Ray Tracing for Compute and the role of the emerging ANARI Kronos API, the necessary merging of Compute and Visualization workflows, aka In Situ; plus advances in color theory enabling scientists to extract more detail during visual analysis.

About the Speaker
Paul A. Navrátil is an expert in high-performance visualization technologies, accelerator-based computing and advanced rendering techniques at the Texas Advanced Computing Center (TACC) at The University of Texas at Austin. His research interests include efficient algorithms for large-scale parallel visualization and data analysis (VDA) and innovative design for large-scale VDA systems. Dr. Navrátil’s recent work includes algorithms for large-scale distributed-memory ray tracing. This work enables photo-realistic rendering of the largest datasets produced on supercomputers today, such as cosmologic simulations of the Universe and computational fluid dynamics simulations at unprecedented levels of detail. He directs the Visualization area at TACC, which includes the Scalable Visualization Technologies (SVT) and Visualization Interfaces and Applications (VIA) groups. Dr. Navrátil’s work has been featured in numerous venues, both nationally and internationally, including the New York Times, Discover, and PBS News Hour. He holds BS, MS and Ph.D. degrees in Computer Science and a BA in Plan II Honors from The University of Texas at Austin.

Extending Rice University’s HPCToolkit to Measure and Analyze the Performance of Applications Accelerated with Intel GPUs

Xiaozhu Meng & Aaron Cherian
Xiaozhu Meng & Aaron Cherian

HPC Kit is an integrated suite of tools for measurement and analysis of program performance on computers ranging from multicore desktop systems to the nation’s largest supercomputers. By using statistical sampling of timers and hardware performance counters, HPC Kit collects accurate measurements of a program’s work, resource consumption, and inefficiency and attributes them to the full calling context in which they occur. HPC Kit supports measurement and analysis of serial codes, threaded codes (e.g. pthreads, OpenMP), MPI, and hybrid (MPI+threads) parallel codes. With the prevalence of using GPU as an accelerator for scientific computation, we are extending HPC Kitto support applications accelerated with GPUs from several vendors. In this presentation, we are going to discuss our recent development for supporting Intel GPUs and our experience of profiling several applications porting to Intel GPUs using Data Parallel C++.

About the Speakers
Xiaozhu is an active developer for the Rice HPCToolkit. He is presently working on an implementation of the HPCToolkit on top of the Level 0 in oneAPI and have designed and implemented a parallel binary analysis for analyzing program control flow using OpenMP task parallelism and Intel® oneAPI Threading Building Blocks (TBB).

Aaron Thomas Cherian is a Doctorate Student of Computer Science working under Dr. John Mellor-Crummey at Rice University. He is an active developer for the HPCToolkit project, an integrated suite of tools for measurement and analysis of application performance on computers ranging from desktops to supercomputers. He obtained his Bachelors in Computer Science in 2017 from University of Mumbai, India. His research focuses on the study and instrumentation of binary code and its applications in HPC. He is presently working on implementation of the HPCToolkit on top of the OpenCL API to provide course-grained and fine-grained profile metrics for Intel OpenCL and DPC++ applications.

Accelerating Oil and Gas Applications with SYCL for FPGAs

Ricardo Menotti

FPGAs have enormous potential to parallel many workloads, offering superior performance than other architectures for a fraction of the energy cost. However, designing custom architectures requires specific knowledge and is time consuming. Seismic applications for prospecting for oil and gas, which work based on the time it takes reflected sound waves to travel through materials of varying densities, are known for their complexity and high computational costs. In this lecture, we will report our experience in accelerating two of these applications using SYCL with oneAPI for FPGAs framework, as well as the results obtained so far.

About the Speaker
Ricardo Menotti holds a doctorate in Computer Science and Computational Mathematics at the University of São Paulo (2010), master’s in computer science and Computational Mathematics at the University of São Paulo (2005) and bachelor’s in computer science from the University of Oeste Paulista (2002). He is currently a professor at the Federal University of São Carlos. He has experience in computer science, with emphasis on computer architecture, reconfigurable computing and compilers.

Introduction/Closing

Sujata Tibrewala
Sujata Tibrewala

Sujata Tibrewala is an Intel community development manager and technology evangelist who defines programs to enable ecosystem developers on heterogeneous computing and oneAPI. She is a co-chair for IEEE Edge Automation Platform Roadmap, for Beyond 5G Technology Roadmap. Under her leadership in the Open Source Intel Network Developer Evangelism program was nominated for Network Transformation Awards 2018 and received Edison award and Network Developer Dynamo award at Intel. She is a frequent presenter at various IEEE and industry conferences and has held positions of Director at Silicon Valley Engineering Council and TSC chair for Documentation Sub-committee Akraino. She has a Master’s from IISc Bangalore and Bachelors from IIT Kharagpur and has completed an Executive Women Leadership Program from Stanford.

oneAPI spec & Industry Panel

Gergana Slavova
Gergana Slavova

Gergana has 14 years’ experience in High Performance Computing (HPC). For the majority of her career, she has focused on customer enabling, training, and consulting for distributed applications using the Message Passing Interface (MPI) and Intel’s development tools. Most recently, she’s participating in the oneAPI industry initiative that works on defining an open cross-vendor, cross-platform programming model for accelerators.

oneAPI spec & Industry Panel

Ronan Keryell
Ronan Keryell

Ronan Keryell is principal software engineer at Xilinx Research Labs. He works on SYCL C++-based programming models for heterogeneous system like FPGA and CGRA. He is the specification editor of the SYCL standard, member of the SYCL, SPIR & OpenCL standard committees from Khronos Group & ISO C++ committee. Ronan Keryell received his MSc in Electrical Engineering and PhD in Computer Science in 1992 from École Normale Supérieure of Paris & University of Paris Sud (France), on the design of a massively parallel RISC-based VLIW-SIMD graphics computer and its programming environment. He was co-founder of 3 start-ups, mainly in high-performance computing, was the technical lead of the Par4All automatic parallelizer at SILKAN, targeting OpenMP, CUDA & OpenCL from sequential C & Fortran. Before joining Xilinx, he worked at AMD on programming models for GPU.

oneAPI spec & Industry Panel

Nevin Liber
Nevin Liber

Nevin Liber is a computer scientist at Argonne National Laboratory, working on the oneAPI/SYCL backend for Kokkos (C++ Performance Portability Programming EcoSystem). He is also the Vice Chair for the Library Evolution Working Group Incubator (SG18) of the C++ Committee, and the Argonne representative for the SYCL Standardization effort. He first discovered C++ over three decades ago while at Bell Labs when a friend called and asked, “What do you know about C++? You folks invented it!” His professional career has taken him across various industries and platforms: big data, low-latency, operating systems, embedded systems, telephony and now exascale computing, just to name a few. He spends much of his time pushing his peers, colleagues and friends to use modern C++ constructs along the way.

oneAPI spec & Industry Panel

Penporn Koanantakool
Penporn Koanantakool

Penporn Koanantakool is a senior software engineer at Google. She works on accelerating machine learning applications, primarily through tuning TensorFlow, Google’s open-source machine learning platform. She leads TensorFlow’s performance optimization collaboration with Intel. Penporn holds a Ph.D. in computer science from the University of California, Berkeley, specializing in high-performance computing, and a B.Eng. in computer engineering from Kasetsart University, Thailand.

oneAPI spec & Industry Panel

Andrew Lumsdaine
Andrew Lumsdaine

As Chief Scientist at the Northwest Institute for Advanced Computing (NIAC), Andrew wears at least two hats: Laboratory Fellow at Pacific Northwest National Laboratory and Affiliate Professor in the Paul G. Allen School of Computer Science and Engineering. As a dual-appointee between UW and PNNL Andrew also has the title of “UW-PNNL Distinguished Faculty Fellow.” By spanning a university and a national laboratory he has the opportunity to work on basic research questions and then reduce those results to practice. His primary research interest is High Performance Computing, interpreted broadly. Of particular interest throughout most of his career has been scalable graph algorithms. He has also had a side interest in computational photography, which turned out to be surprising fruitful.

oneAPI Tools Panel

Henry Gabb
Henry Gabb

Henry A. Gabb is a Senior Principal Engineer in Intel’s Architecture, Graphics, and Software group. Much of his career has been spent promoting the value of parallel computing, now focusing on oneAPI for heterogeneous parallelism. Henry holds a bachelor’s degree in biochemistry from Louisiana State University, a master’s degree in medical informatics from the Northwestern Feinberg School of Medicine, a doctorate in molecular genetics from the University of Alabama at Birmingham School of Medicine, and a doctorate in information science from the University of Illinois at Urbana-Champaign. Prior to joining Intel, he was Director of Scientific Computing at the US Army Engineer Research and Development Center MSRC, a Department of Defense high-performance computing facility.

oneAPI Tools Panel

Paul Petersen
Paul Petersen

Paul Petersen is a Sr. Principal Engineer in IAGS (Intel Architecture, Graphics & Software), and oneAPI Tools Architect. He received a Ph.D. in Computer Science from the University of Illinois in 1993. Starting at Kuck and Associates, Inc. (KAI) projects included contributions to the auto-parallelizing compiler (KAP) and was involved in the early definition and implementations of OpenMP. While at KAI, he developed the Assure line of parallelization/correctness products, for Fortran, C++ and Java. In 2000, Intel Corporation acquired KAI, and he joined the software tools group creating the Thread Checker products, which evolved into the Inspector and Advisor components of the Intel® Parallel Studio. Inspector uses dynamic binary instrumentation to detect memory and concurrency bugs, and Advisor uses similar techniques along with performance measurement and modeling to assist developers in transforming existing serial applications to be ready for parallel execution. The focus on product architecture in Parallel Studio XE and its component product architecture transitioned to creating and leading a pathfinding team. The work on defining next generation features for parallel runtimes and software analysis tools to better enable Intel platforms, more recently transitioned to current role leading the oneAPI Tools Architecture team.

oneAPI Tools Panel

Ruyman Reyes
Ruyman Reyes

Ruyman is a software engineer with background in High Performance Computing and extensive experience in programming models and heterogeneous platforms. Ruyman holds a PhD from University of La Laguna (Spain). He completed his dissertation, named Directive based approach to Heterogeneous Computing in December 2012, while working as Application Developer in the Edinburgh Parallel Computing Center (EPCC). He moved later in December 2013 to Codeplay Software where he has helped to define the SYCL open standard for heterogeneous programming, led development of ComputeCpp and more recently contributions to the DPC++ compiler project.

oneAPI Tools Panel

Ramesh Peri
Ramesh Peri

Ramesh is a senior Principal Engineer in IAGS and is the performance architect of oneAPI. His area of expertise includes programming languages, compilers, debuggers, and profilers for Intel datacenter/accelerator/HPC/mobile/IoT platforms. He developed software development tools for number of processors that include machine learning accelerators, DSPs, micro-controllers, GPUs and a variety of application processors based on many different kinds of architectures like x86 and ARM. He holds a Ph.D in computer science from the University of Virginia(USA), an MTech from IIT Kanpur (India), and a BS from REC Warangal (India). Prior to Intel, Ramesh worked at Hewlett Packard, Lucent and Panasonic.

×

oneAPI spec 1.0 now available

Learn More