An Efficient Lossy Compression Framework for Reducing Memory Footprint for Extreme-Scale Deep Learning on GPU-Based HPC Systems

Dingwen Tao, Indiana University

0000-0001-5422-4497

ACCESS Allocation Request ASC200032

Abstract:	Deep learning (DL) has rapidly evolved to a state-of-the-art technique in many science and technology disciplines, such as scientific exploration, national security, smart environment, and healthcare. Many of these DL applications require using high-performance computing (HPC) resources to process large amounts of data. Researchers and scientists, for instance, are employing extreme-scale DL applications in HPC infrastructures to classify extreme weather patterns and high-energy particles. In recent years, using Graphics Processing Units (GPUs) to accelerate DL applications has attracted increasing attention. However, the ever-increasing scales of DL applications bring many challenges to today's GPU-based HPC infrastructures. The key challenge is the huge gap (e.g., one to two orders of magnitude) between the memory requirement and its availability on GPUs. This NSF project aims to fill this gap by developing a novel framework to reduce the memory demand effectively and efficiently via data compression technologies for extreme-scale DL applications. The proposed research will enhance the GPU-based HPC infrastructures in broad communities for many scientific disciplines that rely on DL technologies. The project will connect machine learning and HPC communities and increase interactions between them. Educational and engagement activities include developing new curriculum related to data compression, mentoring a selected group of high school students in a year-long research project for a regional Science Fair competition, and increasing the community's understanding of leveraging HPC infrastructures for DL technologies. The project will also encourage student interest in research related to DL technologies on HPC environment and promote research collaborations with multiple national laboratories. Existing state-of-the-art GPU memory saving methods for training extreme-scale deep neural networks (DNNs) suffer from high performance overhead and/or low memory footprint reduction. Error-bounded lossy compression is a promising approach to significantly reduce the memory footprint while still meeting the required analysis accuracy. This NSF project will explore how to leverage error-bounded lossy compression on DNN intermediate data to reduce the memory footprint for extreme-scale DNN training. The project has a three-stage research plan. First, the team will comprehensively investigate the impacts of applying error-bounded lossy compression to DNN intermediate data on both validation accuracy and training performance, using different error-bounded lossy compressors, compression modes, and error bounds on the targeted DNNs and datasets. Second, the team will optimize the compression quality of suitable error-bounded lossy compressors on different intermediate data based on the impact analysis outcome, and design an efficient scheme to adaptively apply a best-fit compression solution. Finally, the team will optimize the compression performance on the proposed lossy compression framework for state-of-the-art GPUs. The team will evaluate the proposed framework on high-resolution climate analytics and high-energy particle physics applications and compare it with existing state-of-the-art techniques based on both the memory footprint reduction ratio and training performance improvements (e.g., throughput, time, epoch number). The project will enable scientists and researchers to train extreme-scale DNNs with a given set of computing resources in a fast and efficient manner, opening opportunities for new discoveries.

Allocations:

2023	PSC Bridges-2 Extreme Memory (PSC Bridges-2 EM)	28.0 Core-hours
2023	PSC Bridges-2 GPU (PSC Bridges-2 GPU)	4,182.0 GPU Hours
2023	PSC Bridges-2 Regular Memory (PSC Bridges-2 RM)	108,917.0 Core-hours
2023	PSC Bridges-2 Storage (PSC Ocean)	4,279.0 GB
The estimated value of these awarded resources is $2,920.93. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.
2022	PSC Bridges-2 Extreme Memory (PSC Bridges-2 EM)	10,000.0 Core-hours
2022	PSC Bridges-2 GPU (PSC Bridges-2 GPU)	2,814.0 GPU Hours
2022	PSC Bridges-2 Regular Memory (PSC Bridges-2 RM)	30,000.0 Core-hours
2022	PSC Bridges-2 Storage (PSC Ocean)	2,500.0 GB
The estimated value of these awarded resources is $3,331.78. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.

Click to show/hide prior allocations »

2021	PSC Bridges-2 GPU (PSC Bridges-2 GPU)	2,762.0 GPU Hours
2021	PSC Bridges-2 Regular Memory (PSC Bridges-2 RM)	15,838.0 Core-hours
2021	PSC Bridges-2 Storage (PSC Ocean)	2,000.0 GB
The estimated value of these awarded resources is $1,229.33. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.
2020	PSC Bridges-2 GPU Artificial Intelligence (PSC Bridges-2 GPU-AI)	3,607.0 GPU Hours
2020	PSC Bridges-2 Storage (PSC Ocean)	1,524.0 GB
2020	PSC GPU-AI (Bridges GPU Artificial Intelligence)	893.34 GPU Hours
The estimated value of these awarded resources is $3,467.27. The allocation of these resources represents a considerable investment by the NSF in advanced computing infrastructure for the U.S. The dollar value of the allocation is estimated from the NSF awards supporting the allocated resources.

Other Titles:

There are no prior titles for this project.