Esteban Meneses
PhD Student
emenese2 at illinois.edu
Profile

I am an Assistant Professor in the School of Computing at the Costa Rica Institute of Technology. This is my new webpage.

During my time in the PPL I worked mostly on fault tolerance. I was involved in several projects related to resilience in HPC, ranging from efficient checkpoint/restart mechanisms to understanding how failures and energy interplay. My thesis focused on scalable message-logging techniques. In particular, I developed a collection of strategies to reduce the memory overhead of the message log.

Research Areas
Papers
15-16
2015
[Paper]
A Fault-Tolerance Protocol for Parallel Applications with Communication Imbalance [SBAC-PAD 2015]
15-10
2015
[Paper]
CAMEL: Collective-Aware Message Logging [TJS 2015]
14-21
2014
[Paper]
Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance [Cluster 2014]
14-20
2014
[Paper]
Using Migratable Objects to Enhance Fault Tolerance Schemes in Supercomputers [IEEE Transactions on Parallel and Distributed Systems 2014]
14-02
2014
[Paper]
Energy Profile of Rollback-Recovery Strategies in High Performance Computing [ParCo 2014]
13-61
2013
[Paper]
Communication and Topology-aware Load Balancing in Charm++ with TreeMatch [Cluster 2013]
| Emmanuel Jeannot | Esteban Meneses | Guillaume Mercier | François Tessier | Gengbin Zheng
13-60
2013
[Paper]
Position Paper: Actionable Performance Modeling for Future Supercomputers [MODSIM 2013]
13-25
2013
[Paper]
A ‘Cool’ Way of Improving the Reliability of HPC Machines [SC 2013]
13-24
2013
[Paper]
ACR: Automatic Checkpoint/Restart for Soft and Hard Error Protection [SC 2013]
13-22
2013
[Paper]
Position Paper: A Multi-resolution Emulation + Simulation Methodology [MODSIM 2013]
13-17
2013
[PhD Thesis]
Scalable Message-Logging Techniques for Effective Fault Tolerance in HPC Applications [Thesis 2013]
12-37
2012
[Paper]
Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems [SBAC-PAD 2012]
12-32
2012
[Paper]
Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm [Cluster 2012]
12-14
2012
[Paper]
A Message-Logging Protocol for Multicore Systems [FTXS 2012]
12-04
2012
[Paper]
A Scalable Double In-memory Checkpoint and Restart Scheme towards Exascale [PPL Technical Report 2012]
11-30
2011
[Paper]
Design and Analysis of a Message Logging Protocol for Fault Tolerant Multicore Systems [PPL Technical Report 2011]
11-26
2011
[Paper]
Dynamic Load Balance for Optimized Message Logging in Fault Tolerant HPC Applications [Cluster 2011]
11-04
2011
[Paper]
Evaluation of Simple Causal Message Logging for Large-Scale Fault Tolerant HPC Systems [DPDNS 2011]
10-20
2010
[Paper]
Periodic Hierarchical Load Balancing for Large Supercomputers [IJHPCA 2010]
10-08
2010
[Paper]
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers [P2S2 2010]
10-02
2010
[Paper]
Team-based Message Logging: Preliminary Results [Resilience 2010]
Talks/Posters
14-31
2014
[Talk]
Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance [Cluster 2014]
13-50
2013
[Talk]
A ‘Cool’ Way of Improving the Reliability of HPC Machines [SC 2013]
12-43
2012
[Talk]
Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems [SBAC-PAD 2012]
12-30
2012
[Talk]
A Message-Logging Protocol for Multicore Systems [FTXS 2012]
11-38
2011
[Talk]
Dynamic Load Balance for Optimized Message Logging in Fault Tolerant HPC Applications [Cluster 2011]
10-44
2010
[Talk]
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers [P2S2 2010]
10-37
2010
[Talk]
Clustering Parallel Applications to Enhance Message Logging Protocols [PPL Talk 2010]
10-29
2010
[Talk]
Team-based Message Logging: Preliminary Results [Resilience 2010]
09-27
2009
[Talk]
Adaptive Runtime Support for Fault Tolerance [PPL Talk 2009]