Position Paper: Actionable Performance Modeling for Future Supercomputers
DOE Workshop on Modeling and Simulation of Exascale Systems and Applications (MODSIM) 2013
Publication Type: Paper
Repository URL: papers/201306_ActionModel
As we go beyond the current scale of computers to those with peak capabilities beyond an ExaFLOP/s, it is becoming clear that an introspective and adaptive runtime system (RTS) will be essential, to deal with the complexities generated by sophisticated applications and complex machines. The applications will incorporate adaptive numerical algorithms, such as dynamic adaptive mesh refinements, and multi-time-stepping. The machines will exhibit static and dynamic variability, including component failures/errors. The RTS will need to make quick decisions by adjusting machine configurations (e.g. processors used, power lev- els of each component, etc.), runtime strategies (e.g. changing scheduling strategy, selecting load balancers, or parameterizing strategies) and application. For making such decisions quickly, it needs simple but effective models of various subcomponents of the parallel machine and the application to predict how they will behave under a reconfiguration the RTS is considering. Such fast models, which may sacrifice some accuracy in return for extreme speed, are called Actionable Models in the rest of this paper.