1st Workshop on AI for Systems

(AI4Sys 2023)

In conjunction with HPDC 2023

Orlando, Florida, USA

June 30, 2023



Call for Papers

AI/ML techniques are being incorporated into all aspects of the scientific and engineering process. One early effort area has been to replace previous autonomic system components with AI models that can offer more nuanced and continuously updated automation behavior. Work has been done to try to predict IO behavior to enable more efficient machine throughput, log monitoring to detect patterns that may reveal either security concerns or faulty components that fail in consistent, but unusual ways, and to manage applications and caches to better address the system at a whole than by individual components. All of these and many more system-related tasks address a complex, sometimes intractable problem, and seeks to use AI tools to offer better solutions than either heuristics or scope limited solutions that have existed previously.

This workshop solicits novel work that explores how to effectively incorporate AI into system management and monitoring, particularly for complex systems that support scientific and engineering workloads (i.e., cloud and HPC).

Areas of interest and domains of work include, but are not limited to:
  • Tools and runtimes for incorporating AI into systems
  • Privacy and security concerns for managing system data used for model creation
  • Continuous model evolution and the impacts of chasing current workloads on a dynamic system
  • AI algorithms for systems problems
  • Subsystem related optimizations including operating systems, data migration, storage, job management, resource allocation, and related topics
  • Position and experience papers on using AI in systems

Submission

Submitted papers need to be formatted in the ACM conference format, with a page limit of no more than 5 pages long including everything except references. Accepted papers will be published in the ACM proceedings. Submissions will be peer-reviewed in a single blind way; author names and affiliations need to appear in the paper submission, but reviewer names will remain anonymous.



Program

  • 9:00 - 9:05: Welcome/Opening Remarks
  • 9:05 - 10:05: Keynote talk by Professor Devesh Tiwari (Northeastern University)
  • 10:00 - 10:15 Break
  • 10:15 - 10:35 Streaming Machine Learning for Supporting Data Prefetching in Modern Data Storage Systems
  • 10:35 - 10:55 Towards Practical Machine Learning Frameworks for Performance Diagnostics in Supercomputers
  • 10:55 - 11:15 Anomaly Detection in Scientific Datasets using Sparse Representation
  • 11:15 - 12:00 Panel with Speakers

Link to Proceedings

Abstract: Traditionally, we have assumed that large-scale computing users are fairly boring and that their workloads often do similar things repetitively. Their “boring” nature has served us well so far — we could design “boring” systems and get away with it. But, now things are changing and changing fast. Our workloads and users are becoming interesting and, often, are surprising us with new trends and behavior. That means it is springing excitement into our lives. We need to design interesting solutions and come out of our boredom. The race to apply AI/ML methods is faster than ever. In this talk, I’ll discuss a few lessons I learned as we applied AI/ML methods to computer systems resource management problems.

Speaker Bio: Professor Devesh Tiwari is an educator and researcher at Northeastern University where he directs the Goodwill Labs. His group innovates new solutions to make large-scale classical HPC systems and quantum computing systems more efficient, reliable, and cost-effective. Before joining the Northeastern faculty, Devesh was a staff scientist at the United States Department of Energy (DOE) Oak Ridge National Laboratory. Devesh was recognized with multiple awards including the DSN Dependability Rising Star Award, the NSF CAREER Award, and the Facebook Faculty Research Award. Devesh’s research group has lowered the barrier to entry and accelerated the R&D efforts in multiple emerging computer systems areas including HPC, quantum system software, serverless computing, and AI-driven data center optimizations, via open-sourcing novel software artifacts and datasets. The research contributions from his group have been recognized with many best paper nominations and fellowships/awards. For his teaching and mentoring contributions, he was awarded the Professor of the Year by the Northeastern University chapter of the IEEE Eta Kappa Nu honor society. Devesh has also introduced several novel peer-review elements in the computer systems community in his role as the program co-chair/track co-chair for various conferences. Most recently, he was the Technical Program Committee Co-Chair for HPDC’22 and is the overall Technical Program Committee Co-Chair for IPDPS’23. He is an Associate Editor for Transactions of Parallel & Distributed Computing (TPDS), Transactions of Storage (ToS), and Journal of Parallel & Distributed Computing (JPDC). He was recognized with the TPDS Editorial Excellence Award for his exceptional contributions to the TPDS journal as an editor.


Committees

Organizing Committee

  • Jay Lofstead (Sandia National Laboratories, USA)
  • Jai Dayal (Samsung Advanced Institute of Technology)