Project

General

Profile

Feature #975

OFI Layer

Added by Bilge Acun over 3 years ago. Updated over 1 year ago.

Status:
Merged
Priority:
Normal
Assignee:
Category:
Machine Layers
Target version:
Start date:
09/11/2017
Due date:
% Done:

100%


Description

Integration of OFI layer implemented by Intel. Test and make it work on the Cisco cluster.

The initial patch from the integration is here: https://charm.cs.illinois.edu/gerrit/#/c/1047/

It currently does not work on the Pacini cluster.


Subtasks

Feature #1673: Basic OFI (libfabric) network layer using PSM transportMergedNitin Bhat


Related issues

Related to Charm++ - Bug #1675: OFI replica crashes Merged 09/13/2017

History

#1 Updated by Bilge Acun over 3 years ago

  • Status changed from New to In Progress

#2 Updated by Sam White over 2 years ago

  • Target version set to 6.9.0
  • Category set to Machine Layers

#4 Updated by Bilge Acun almost 2 years ago

  • Assignee changed from Bilge Acun to Nitin Bhat

The newest version of the patch from Intel is here: https://charm.cs.illinois.edu/gerrit/#/c/2759/
Previous patches are abandoned.

Also, assigning the feature to Nitin who is doing the testing of the new patch.

#5 Updated by Nitin Bhat almost 2 years ago

  • Status changed from In Progress to Implemented
  • Target version changed from 6.9.0 to 6.8.1

The current patch shows decent performance improvements over the MPI build on both Stampede2 and Bridges.

This patch includes the use of a cache for the OFI machine specific metadata (OFIRequest) and a memory pool similar to the one used in the gni layer. However, since the performance improvements with the cache and the mempool weren't significant, I've disabled the use of the cache and mempool for now.

Results from the OFI layer patch:
1. Initial results with cache enabled and no mempool : https://docs.google.com/spreadsheets/d/1T7yNr_cObWqR7dgiN14hXyvv1dnwG-79quF4wGUIObg/edit?usp=sharing
2. Results comparing cache and no-cache implementations : https://docs.google.com/spreadsheets/d/1AqursBrFjjIPJb7I1AYB8rBhWtevHR5CcA4ku4--Uo8/edit?usp=sharing
3. Results comparing mempool and no-mempool implementations : https://docs.google.com/spreadsheets/d/1wC089wF4ZWoAJINU9DFOQUhTGr2FXzdBaJbatE4ToAY/edit?usp=sharing

#6 Updated by Jim Phillips almost 2 years ago

Do you have any synthetic performance tests besides ping-pong?
If you can make this compatible with the 6.8.0 head I can test it with NAMD.

#7 Updated by Nitin Bhat almost 2 years ago

Other than pingpong, I haven't done any other synthetic performance tests.
I have tested it on NAMD, ChaNGa and Openatom (for a few smaller nodes). The tabs in the botton of the spreadsheets have results from these.

#8 Updated by Phil Miller almost 2 years ago

  • Target version changed from 6.8.1 to 6.9.0

#9 Updated by Phil Miller almost 2 years ago

  • Related to Bug #1675: OFI replica crashes added

#10 Updated by Sam White over 1 year ago

  • Status changed from Implemented to Merged

Also available in: Atom PDF