Charm++ Support for Out-of-core Computation

- Memory-to-flop ratio keeps decreasing.
- SSD are becoming more feasible.
- Leverage the runtime system to prefetch data in SSD.
- Overlap I/O with computation.
System Design

- Scheduling tasks based on availability
- Leverage data effects
- Mitigate the load imbalance caused by data availability
User API

- Declare the data that can be either in DRAM or SSD
- Express the dependences between data and task
Performance

![Performance Graph]

- **Y-axis:** Normalized Time
- **X-axis:** Available Memory (%)
- **Legend:**
  - Charm-HMC
  - mmap

The graph illustrates the performance of Charm-HMC and mmap under varying available memory conditions.
Future Support

- Up to 72 new Intel® Architecture cores
- 36MB shared L2 cache
- Full Intel® Xeon™ processor ISA compatibility through Intel® Advanced Vector Extensions 2
- Extending Intel® Advanced Vector Extensions architecture to 512b (AVX-512)
- Based on Silvermont microarchitecture:
  - 4 threads/core
  - Dual 512b Vector units/core
- 6 channels of DDR4 2400 up to 384GB
- 36 lanes PCI Express® (PCIe®) Gen 3
- 8GB/16GB of extremely high bandwidth on package memory
- Up to 3x single thread performance improvement over prior gen 1.2
- Up to 3x more power efficient than prior gen 1.2

1. As projected based on early product definition and as compared to prior generation Intel® Xeon Phi™ Coprocessors.
2. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.