Parallel Programming Laboratory

Esteban Meneses

PhD Students

emenese2 at illinois.edu

Profile

I am an Assistant Professor in the School of Computing at the Costa Rica Institute of Technology. This is my new webpage.

During my time in the PPL I worked mostly on fault tolerance. I was involved in several projects related to resilience in HPC, ranging from efficient checkpoint/restart mechanisms to understanding how failures and energy interplay. My thesis focused on scalable message-logging techniques. In particular, I developed a collection of strategies to reduce the memory overhead of the message log.

Research Areas

Papers

16-12

2016
[Paper]

Power, Reliability, Performance: One System to Rule Them All [Computer 2016]

[BIB] [PDF]

More...

15-16

2015
[Paper]

A Fault-Tolerance Protocol for Parallel Applications with Communication Imbalance [SBAC-PAD 2015]

| Esteban Meneses | Laxmikant Kale

[BIB] [PDF]

More...

15-10

2015
[Paper]

CAMEL: Collective-Aware Message Logging [TJS 2015]

| Esteban Meneses | Laxmikant Kale

[BIB] [PDF]

More...

14-21

2014
[Paper]

Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance [Cluster 2014]

[BIB] [BIB~] [PDF]

More...

14-20

2014
[Paper]

Using Migratable Objects to Enhance Fault Tolerance Schemes in Supercomputers [IEEE Transactions on Parallel and Distributed Systems 2014]

[BIB] [PDF]

More...

14-02

2014
[Paper]

Energy Profile of Rollback-Recovery Strategies in High Performance Computing [ParCo 2014]

| Esteban Meneses | Osman Sarood | Laxmikant Kale

[BIB] [PDF]

More...

13-61

2013
[Paper]

Communication and Topology-aware Load Balancing in Charm++ with TreeMatch [Cluster 2013]

[BIBTXT] [PDF]

More...

13-60

2013
[Paper]

Position Paper: Actionable Performance Modeling for Future Supercomputers [MODSIM 2013]

[BIB] [PDF]

More...

13-25

2013
[Paper]

A ‘Cool’ Way of Improving the Reliability of HPC Machines [SC 2013]

| Osman Sarood | Esteban Meneses | Laxmikant Kale

[BIB] [PDF]

More...

13-24

2013
[Paper]

ACR: Automatic Checkpoint/Restart for Soft and Hard Error Protection [SC 2013]

| Xiang Ni | Esteban Meneses | Nikhil Jain | Laxmikant Kale

[BIB] [PDF]

More...

13-22

2013
[Paper]

Position Paper: A Multi-resolution Emulation + Simulation Methodology [MODSIM 2013]

[BIB] [PDF]

More...

13-17

2013
[PhD Thesis]

Scalable Message-Logging Techniques for Effective Fault Tolerance in HPC Applications [Thesis 2013]

| Esteban Meneses

[BIB] [PDF]

More...

12-37

2012
[Paper]

Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems [SBAC-PAD 2012]

| Esteban Meneses | Osman Sarood | Laxmikant Kale

[BIB] [PDF]

More...

12-32

2012
[Paper]

Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm [Cluster 2012]

| Xiang Ni | Esteban Meneses | Laxmikant Kale

[BIB] [PDF]

More...

12-14

2012
[Paper]

A Message-Logging Protocol for Multicore Systems [FTXS 2012]

| Esteban Meneses | Xiang Ni | Laxmikant Kale

[BIB] [PDF]

More...

12-04

2012
[Paper]

A Scalable Double In-memory Checkpoint and Restart Scheme towards Exascale [PPL Technical Report 2012]

| Gengbin Zheng | Xiang Ni | Esteban Meneses | Laxmikant Kale

[BIB]

More...

11-30

2011
[Paper]

Design and Analysis of a Message Logging Protocol for Fault Tolerant Multicore Systems [PPL Technical Report 2011]

| Esteban Meneses | Xiang Ni | Laxmikant Kale

[BIB] [PDF]

More...

11-26

2011
[Paper]

Dynamic Load Balance for Optimized Message Logging in Fault Tolerant HPC Applications [Cluster 2011]

| Esteban Meneses | Greg Bronevetsky | Laxmikant Kale

[BIB] [PDF]

More...

11-04

2011
[Paper]

Evaluation of Simple Causal Message Logging for Large-Scale Fault Tolerant HPC Systems [DPDNS 2011]

| Esteban Meneses | Greg Bronevetsky | Laxmikant Kale

[BIB] [PDF]

More...

10-20

2010
[Paper]

Periodic Hierarchical Load Balancing for Large Supercomputers [IJHPCA 2010]

| Gengbin Zheng | Abhinav Bhatele | Esteban Meneses | Laxmikant Kale

[BIB] [PDF]

More...

10-08

2010
[Paper]

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers [P2S2 2010]

| Gengbin Zheng | Esteban Meneses | Abhinav Bhatele | Laxmikant Kale

[BIB] [PDF]

More...

10-02

2010
[Paper]

Team-based Message Logging: Preliminary Results [Resilience 2010]

| Esteban Meneses | Celso Mendes | Laxmikant Kale

[BIB] [PDF]

More...

Talks/Posters

14-31

2014
[Talk]

Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance [Cluster 2014]

[PDF]

More...

13-50

2013
[Talk]

A ‘Cool’ Way of Improving the Reliability of HPC Machines [SC 2013]

| Osman Sarood | Esteban Meneses | Laxmikant Kale

[PDF] [KEY] [PPT]

More...

12-43

2012
[Talk]

Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems [SBAC-PAD 2012]

| Esteban Meneses | Osman Sarood | Laxmikant Kale

[BIB] [PDF]

More...

12-30

2012
[Talk]

A Message-Logging Protocol for Multicore Systems [FTXS 2012]

| Esteban Meneses | Xiang Ni | Laxmikant Kale

[PDF]

More...

11-38

2011
[Talk]

Dynamic Load Balance for Optimized Message Logging in Fault Tolerant HPC Applications [Cluster 2011]

| Esteban Meneses | Greg Bronevetsky | Laxmikant Kale

[PDF]

More...

10-46

2010
[Talk]

Clustering Message Passing Applications to Enhance Fault Tolerance Protocols [JLPC 2010]

| Esteban Meneses

More...

10-44

2010
[Talk]

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers [P2S2 2010]

| Gengbin Zheng | Esteban Meneses | Abhinav Bhatele | Laxmikant Kale

[PPTX]

More...

10-37

2010
[Talk]

Clustering Parallel Applications to Enhance Message Logging Protocols [PPL Talk 2010]

| Esteban Meneses

[PDF]

More...

10-29

2010
[Talk]

Team-based Message Logging: Preliminary Results [Resilience 2010]

| Esteban Meneses

[PDF]

More...

Go to paper

09-27

2009
[Talk]

Adaptive Runtime Support for Fault Tolerance [PPL Talk 2009]

| Celso Mendes | Esteban Meneses

[PDF] [PDF]

More...

Live Webcast 15th Annual Charm++ Workshop