2nd Workshop on AI for Systems

(AI4Sys 2024)

In conjunction with HPDC 2024

Pisa, Italy

June 3rd, 2024



Call for Papers

AI/ML techniques are being incorporated into all aspects of the scientific and engineering process. One early effort area has been to replace previous autonomic system components with AI models that can offer more nuanced and continuously updated automation behavior. Work has been done to try to predict IO behavior to enable more efficient machine throughput, log monitoring to detect patterns that may reveal either security concerns or faulty components that fail in consistent, but unusual ways, and to manage applications and caches to better address the system at a whole than by individual components. All of these and many more system-related tasks address a complex, sometimes intractable problem, and seeks to use AI tools to offer better solutions than either heuristics or scope limited solutions that have existed previously.

This workshop solicits novel work that explores how to effectively incorporate AI into system management and monitoring, particularly for complex systems that support scientific and engineering workloads (i.e., cloud and HPC).

Areas of interest and domains of work include, but are not limited to:
  • Tools and runtimes for incorporating AI into systems
  • Privacy and security concerns for managing system data used for model creation
  • Continuous model evolution and the impacts of chasing current workloads on a dynamic system
  • AI algorithms for systems problems
  • Subsystem related optimizations including operating systems, data migration, storage, job management, resource allocation, and related topics
  • Position and experience papers on using AI in systems

Submission

Submitted papers need to be formatted in the ACM conference format, with a page limit of no more than 5 pages long including everything except references. Accepted papers will be published in the ACM proceedings. Submissions will be peer-reviewed in a single blind way; author names and affiliations need to appear in the paper submission, but reviewer names will remain anonymous.


Submission Link: https://ai4sys24.hotcrp.com/


Important Dates

  • Paper submission deadline: March 30, 2024, AoE
  • Author notification: April 13, 2024
  • Workshop: June 3, 2024

Program

  • 9:00 - 9:05: Welcome/Opening Remarks
  • 9:05 - 10:05: Keynote: The Hitchhiker's Guide to Using Machine Learning in System-level Resource Management. Thaleia Doudali (IMDEA Software Institute)
  • 10:05 - 10:25: MPIrigen: MPI Code Generation through Domain-Specific Language Models
  • 10:25 - 11:00: Morning Coffee Break
  • 11:00 - 11:20: ECO-LLM: LLM-based Edge Cloud Optimization
  • 11:20 - 11:40: StreamingRAG: Real-time Contextual Retrieval and Generation Framework
  • 11:40 - 12:00 Toward Using Representation Learning for Cloud Resource Usage Forecasting

Link to Proceedings

Keynote Talk: The Hitchhiker's Guide to Using Machine Learning in System-level Resource Management.

Abstract: This talk will take you through a journey of best practices, things to avoid and unconventional approaches for integrating machine learning methods in computer system-level resource management of cloud and high performance computing environments. These environments suffer from low resource utilization, due to the significant difference between resources allocated to the users and those actually used in practice. While the use of machine learning can lead to improved resource management and efficiency, its production-level use comes with significant overheads, engineering effort and interpretability concerns. This talk will inspire you to think outside-the-box and lead you to an existential question of whether machine learning is even necessary to use in certain aspects of system-level resource management.

Speaker Bio: Thaleia Dimitra Doudali is an Assistant Research Professor at the IMDEA Software Institute in Madrid, Spain. She received her PhD from the Georgia Institute of Technology (Georgia Tech) in the United States, advised by Ada Gavrilovska. Prior to that she earned anundergraduate diploma in Electrical and Computer Engineering at the National Technical University of Athens in Greece. Thaleia’s research lies at the intersection of Systems and Machine Learning, where she explores novel methodologies, such as machine learning and computer vision, to improve system-level resource management of emerging hardware technologies. In 2021, Thaleia received the Juan de la Cierva post-doctoral fellowship. In 2020, Thaleia was selected to attend the prestigious Rising Stars in EECS academic workshop. Aside from research, Thaleia actively strives to improve the mental health awareness in academia and foster diversity and inclusion.


Committees

Organizing Committee

  • Jay Lofstead (Sandia National Laboratories, USA)
  • Jai Dayal (Samsung Advanced Institute of Technology)

Past Editions