Esteban Meneses
PhD Student
esteban.meneses at acm.org
Profile

I am a Research Assistant Professor in the Center for Simulation and Modeling (SaM) at the University of Pittsburgh. My main role in SaM is to foster the development of accelerator-based scientific applications. To achieve this goal we use a strategy with an educational and a research component. We train people on the use of accelerators and find ways to map the different scientific codes onto these architectures. This is my new webpage.

During my time in the PPL I worked mostly on fault tolerance. I was involved in several projects related to resilience in HPC, ranging from efficient checkpoint/restart mechanisms to understanding how failures and energy interplay. My thesis focused on scalable message-logging techniques. In particular, I developed a collection of strategies to reduce the memory overhead of the message log.

Research Areas
Papers
14-02
2014
[Paper]
Energy Profile of Rollback-Recovery Strategies in High Performance Computing [ParCo 2014]
13-25
2013
[Paper]
A ‘Cool’ Way of Improving the Reliability of HPC Machines [SC 2013]
13-24
2013
[Paper]
ACR: Automatic Checkpoint/Restart for Soft and Hard Error Protection [SC 2013]
13-22
2013
[Paper]
Position Paper: A Multi-resolution Emulation + Simulation Methodology [MODSIM 2013]
13-21
2013
[Paper]
Position Paper: Actionable Performance Modeling for Future Supercomputers [MODSIM 2013]
13-17
2013
[PhD Thesis]
Scalable Message-Logging Techniques for Effective Fault Tolerance in HPC Applications [Thesis 2013]
12-37
2012
[Paper]
Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems [SBAC-PAD 2012]
12-32
2012
[Paper]
Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm [Cluster 2012]
12-14
2012
[Paper]
A Message-Logging Protocol for Multicore Systems [FTXS 2012]
12-04
2012
[Paper]
A Scalable Double In-memory Checkpoint and Restart Scheme towards Exascale [PPL Technical Report 2012]
11-30
2011
[Paper]
Design and Analysis of a Message Logging Protocol for Fault Tolerant Multicore Systems [PPL Technical Report 2011]
11-26
2011
[Paper]
Dynamic Load Balance for Optimized Message Logging in Fault Tolerant HPC Applications [Cluster 2011]
| Esteban Meneses | Greg Bronevetsky | Laxmikant Kale
11-04
2011
[Paper]
Evaluation of Simple Causal Message Logging for Large-Scale Fault Tolerant HPC Systems [DPDNS 2011]
| Esteban Meneses | Greg Bronevetsky | Laxmikant Kale
10-20
2010
[Paper]
Periodic Hierarchical Load Balancing for Large Supercomputers [IJHPCA 2010]
10-08
2010
[Paper]
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers [P2S2 2010]
10-02
2010
[Paper]
Team-based Message Logging: Preliminary Results [Resilience 2010]
Talks/Posters
13-50
2013
[Talk]
A `Cool' Way of Improving the Reliability of HPC Machines [SC 2013]
12-43
2012
[Talk]
Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems [SBAC-PAD 2012]
12-30
2012
[Talk]
A Message-Logging Protocol for Multicore Systems [FTXS 2012]
11-38
2011
[Talk]
Dynamic Load Balance for Optimized Message Logging in Fault Tolerant HPC Applications [Cluster 2011]
| Esteban Meneses | Greg Bronevetsky | Laxmikant Kale
10-44
2010
[Talk]
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers [P2S2 2010]
10-37
2010
[Talk]
Clustering Parallel Applications to Enhance Message Logging Protocols [PPL Talk 2010]
10-29
2010
[Talk]
Team-based Message Logging: Preliminary Results [Resilience 2010]
09-27
2009
[Talk]
Adaptive Runtime Support for Fault Tolerance [PPL Talk 2009]