First page Back Continue Last page Overview Graphics
Background
Checkpoint based methods
- Coordinated – Blocking [Tamir84], Non-blocking [Chandy85]
- Co-check, Starfish, Clip – fault tolerant MPI
- Uncoordinated – suffers from rollback propagation
- Communication – [Briatico84], doesn’t scale well
Log-based
- Pessimistic – MPICH-V1 and V2, SBML [Johnson87]
- Optimistic – [Strom85] unbounded rollback, complicated recovery
- Causal Logging – [Elnozahy93] Manetho, complicated causality tracking and recovery