HPCTESTS 2023
First International Workshop on HPC Testing and Evaluation of Systems, Tools, and Software
https://olcf.github.io/hpc-system-test-wg/hpctests/hpctests2023
Details
- When: Friday, November 17, 2023, 8:30am - 12pm MST, held in conjunction with SC23.
- Where: Room 503-504, Colorado Convention Center (Denver, Colorado, USA)
Description
This workshop brings together HPC researchers, practitioners, and vendors from around the globe to present and discuss state-of-the-art HPC system testing methodologies, tools, benchmarks, tests, procedures, and best practices. The increasing complexity of HPC architectures requires a larger number of tests in order to thoroughly evaluate the status of the system after its installation or before a software upgrade is applied to production systems. Therefore, HPC centers and vendors use different methodologies to evaluate their systems during its lifetime, not only at the beginning during the installation and acceptance time, but also regularly during maintenance windows. This workshop will provide a venue to present and discuss the latest HPC system test technologies and methodologies including, but not limited to, tools used for testing, new test suites and benchmarks developed to assist testing efforts, lessons learned from acceptance and regression testing experiences, and evaluations of new hardware and software that showcase testing best practices.
The event will include a keynote focused on current HPC system testing topics, followed by a series of paper presentations from peer-reviewed accepted submissions, and will conclude with a panel discussion.
Call for Papers
Workshop scope
The First International Workshop on HPC Testing and Evaluation of Systems, Tools, and Software (HPCTESTS 2023), held in conjunction with SC23 (Denver, CO, USA), will bring together experts from high performance computing (HPC) centers around the globe to present and discuss state-of-the-art HPC system testing methodologies, tools, benchmarks, and best practices. The workshop will encourage submissions that highlight current benchmarks, tests, and procedures utilized to evaluate today’s HPC systems. This event will provide an avenue to showcase newly developed tools and methodologies, as well as those that are being actively designed to allow authors to gather feedback from the community that could help guide their project. As machine learning (ML) and deep learning (DL) become more prevalent workloads, HPC centers must provide a wider range of services and more robust and resilient resources in order to support both traditional HPC and ML/DL workloads. The workshop will also invite submissions that are looking ahead at the post-exascale future of HPC system testing to help the community develop alternate mechanisms that could be used to adapt to the evolving and emerging workloads. The event will invite and welcome international participation from HPC centers, academic institutions, as well as representatives from vendors in the supercomputing space.
In addition to discussing procedures and tools utilized, submissions can describe challenges, lessons learned, and best practices used for regression testing, acceptance testing, and hardware evaluations. Furthermore, the workshop aims to encourage submissions that explore testbed evaluations as a means to gather preliminary results on system readiness to assist system design and deployment efforts.
HPCTESTS 2023 will provide the first technical venue for HPC researchers and practitioners to submit their findings, early works, results, new tools, and more. The workshop encourages innovative works in HPC system testing to improve and rethink the mechanisms used for acceptance and regression testing of HPC systems, their software stack, and their user environment to help the community prepare for the post-exascale era.
Workshop topics
Topics of interest include, but are not limited to:
- Testing methodologies and procedures
- Tools and frameworks for regression testing
- Automation of testing and continuous regression testing and performance monitoring
- Development and utilization of new proxy-applications, benchmarks, and real-applications to evaluate a system’s reliability and usability
- Efforts to improve reproducibility, sustainability, and availability of tests that can be leveraged by the community
- Hardware and component focused testing including but not limited to CPUs (x86, ARM), GPUs, AI specialized hardware, SmartNICS (DPUs), memory, network, and storage at all scales (single server to supercomputers and cloud environments)
- System software, programming languages, and library testing
- Monitoring and analysis of tests results for decision making
- Best practices and lessons learned from acceptance and/or regression testing
- Early anomaly detection of failure using ML approaches
- Evaluation of early hardware testbeds and strategies to develop tests on early hardware
- Specification-driven strategies for applications testing and formalisms (e.g., interoperability, composition)
- Application-driven testing strategies
- AI-assisted test generation for HPC systems and/or applications
Paper Submissions
The workshop will publish its proceedings in the SC23 Workshops Proceedings following these guidelines:
- Submissions should adhere to the new ACM proceedings template available at https://www.acm.org/publications/proceedings-template.
- Authors should upload papers in PDF format only.
- Submissions are limited to 8 pages (not counting references) plus an optional 2 page Artifact Description appendix (10 pages total).
- For Latex users, version 1.90 (last updated on April 4, 2023) is the latest template. Please use the “sigconf” option.
- Papers should be submitted via the SC23 Submissions system. Look for the HPCTESTS 2023 submission.
Workshop Deadlines
- Paper Submission Deadline:
August 10, 2023 AoEAugust 17, 2023 AoE - Author Notification: September 8, 2023 AoE
- Camera-ready: September 29, 2023 AoE
Organizing Committees
HPCTESTS 2023 General Chairs
- Verónica G. Melesse Vergara (ORNL, USA)
- Bilel Hadri (KAUST, Saudi Arabia)
- Vasileios Karakasis (NVIDIA, Switzerland)
HPCTESTS 2023 Steering Committee
- Jennifer Green (Los Alamos National Laboratory)
- Keita Teranishi (Oak Ridge National Laboratory)
- Olga Pearce (Livermore National Laboratory)
- Oscar Hernandez (Oak Ridge National Laboratory)
HPCTESTS 2023 Program Committee
- Maciej Cytowski (Pawsey Supercomputing Centre)
- Tina Declerck (Lawrence Berkeley National Laboratory)
- Dan Dietz (Oak Ridge National Laboratory)
- Jens Domke (RIKEN Center for Computational Science)
- Pascal Elahi (Pawsey Supercomputing Centre)
- Ann Gentile (Sandia National Laboratories)
- Anjus George (Oak Ridge National Laboratory)
- Lisa Gerhardt (Lawrence Berkeley National Laboratory)
- Bilel Hadri (King Abdullah University of Science and Technology)
- Nick Hagerty (Oak Ridge National Laboratory)
- Victor Holanda Rusu (Swiss National Supercomputing Centre)
- John Holmen (Oak Ridge National Laboratory)
- Adrian Jackson (University of Edinburgh)
- Vasileios Karakasis (NVIDIA)
- Eirini Koutsaniti (Swiss National Supercomputing Centre, ETH Zurich)
- James Lin (Shanghai Jiao Tong University)
- Amiya K. Maji (Purdue University)
- Verónica G. Melesse Vergara (Oak Ridge National Laboratory)
- Mark O’Shea (Pawsey Supercomputing Centre)
- Guilherme Peretti-Pezzi (Swiss National Supercomputing Center)
- Maria del Carmen Ruiz Varela (Advanced Micro Devices)
- Shahzeb Siddiqui (Lawrence Berkeley National Laboratory)
- Zachary Tschirhart (Hewlett Packard Enterprise)
- Le Mai Weakley (Indiana University)
Agenda
- 8:30am - 8:40am MST Welcome and Introduction by Verónica G. Melesse Vergara
- 8:40am - 9:10am MST Testing the space between: Extending HPC testing for a complex HPC workflow environment by Hai Ah Nam
- 9:10am - 9:35am MST Experiences Detecting Defective Hardware in Exascale Supercomputers by Nick Hagerty, Jordan Webb, Verónica Melesse Vergara, and Matt Ezell
- 9:35am - 10am MST Principles for Automated and Reproducible Benchmarking by Tuomas Koskela, Ilektra Christidi, Mosè Giordano, Emily Dubrovska, Jamie Quinn, Christopher Maynard, Dave Case, Kaan Olgu, and Tom Deakin
- 10am - 10:30am MST Morning Break
- 10:30am - 10:55am MST Ramble: A Flexible, Extensible, and Composable Experimentation Framework by Doug Jacobsen and Bob Bird
- 10:55am - 11:20am MST Toward Collaborative Continuous Benchmarking for HPC by Olga Pearce, Alec Scott, Gregory Becker, Riyaz Haque, Nathan Hanford, Stephanie Brink, Doug Jacobsen, Heidi Poxon, Jens Domke, and Todd Gamblin
- 11:20am - 11:55am MST Perspectives and Discussion Panel with Doug Jacobsen, Olga Pearce, Nick Hagerty, Tuomas Koskela
- 11:55am - 12pm MST Closing Remarks by Bilel Hadri, Vasileios Karakasis, Verónica Melesse Vergara