HPC-Colony: Services and Interfaces for Very Large Systems
OSR Special Issue on HEC OS/Runtimes 2006
Publication Type: Paper
Repository URL: OSR2006
Abstract
Traditional full-featured operating systems are known to have
properties that limit the scalability of distributed memory
parallel programs, the most common programming para-digm utilized
in high end computing. Furthermore, as processor counts increase
with the most capable systems, the necessary activity to manage the
system becomes more of a burden. To make a general purpose
operating system scale to such levels, new technology is required
for parallel resource management and global system management
(including fault management). In this paper, we describe the
shortcomings of full-featured operating systems and runtime systems
and discuss an approach to scale such systems to one hundred
thousand processors with both scalable parallel application
performance and efficient system management.
TextRef
Sayantan Chakravorty and Celso L. Mendes and Laxmikant V. Kale and Terry Jones
and Andrew Tauferner and Todd Inglett and Jose Moreira, "HPC-Colony: Services
and Interfaces for Very Large Systems", ACM SIGOPS Operating Systems Review:
Operating and Runtime Systems for High-end Computing Systems, vol. 40, April 2006.
People
- Sayantan Chakravorty
- Celso Mendes
- Laxmikant Kale
- Terry Jones
- Andrew Tauferner
- Todd Inglett
- Jose Moreira
Research Areas