protect load balancer from variable cpu clock
When running netlrts-smp on Linux with two processes on a single node for GPU-accelerated NAMD, the OS sees some cores (generally those associated with one process or the other) as less busy and slows the cpu clock, which exaggerates the load imbalance and results in even worse overload of the other process following load balancing. Setting the cpu frequency scaling governors to "performance" makes the issue go away, but we need to understand why the OS isn't reading the CPUs as fully loaded or find a way for the load balancer to cope with time-varying cpu speeds.
#1 Updated by Jim Phillips over 4 years ago
The OS was reading the CPU as not loaded due to frequent calls to CmiMachineProgressImpl() in NAMD entry methods. These calls were taking a lot of time waiting on something in the machine layer. When the CmiMachineProgressImpl() calls are removed the OS keeps the cores at full or near-full speed. It is possible that it is a bug for user code to call CmiMachineProgressImpl() in non-smp builds, or at least is shouldn't serve any purpose when there is a communication thread. These calls were originally added to ensure that high-priority incoming messages were received promptly.
#2 Updated by Jim Phillips over 4 years ago
That should be "It is possible that it is a bug for user code to call CmiMachineProgressImpl() in smp builds". In any case, I assume this is related to the issue of the comm thread holding the comm lock while sending and receiving messages rather than only when manipulating queues.