Project

General

Profile

Bug #1390

AMPI_Alltoall crashes for short messages

Added by Karthik Senthil over 2 years ago. Updated over 2 years ago.

Status:
Merged
Priority:
Normal
Category:
AMPI
Target version:
Start date:
01/30/2017
Due date:
% Done:

0%


Description

Running the megampi test for more number of ranks(instead of 4) crashes with a memory corruption. More investigation points to MPI_Alltoall test.

History

#1 Updated by Sam White over 2 years ago

  • Target version changed from 6.8.1 to 6.8.0

We'll want to have MPI_Alltoall working for all message sizes in our 6.8.0 release. In the worst case, we could always fall back to the medium or long message algorithms for short messages as well.

#2 Updated by Sam White over 2 years ago

  • Status changed from New to In Progress

What's the status of this?

#3 Updated by Karthik Senthil over 2 years ago

I think there are multiple bugs associated with this issue.

1. When I run the test as ./pgm +vp 5, the program crashes with

pgm: malloc.c:3695: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed

With GDB this points to a call within the MPI_COMM_SELF tests that are performed in megampi.

2. When I run the test as ./pgm +vp 7, a mismatch of expected value for the Alltoall test is obtained. I am tracing the current recursive doubling algorithm to fix this.

As a temporary fix we can use the medium size messages algorithm for the short messages as well. Interestingly this solves both the above bugs.

#4 Updated by Sam White over 2 years ago

I think we can safely abandon the short message protocol (and use the medium message protocol for those).

#5 Updated by Karthik Senthil over 2 years ago

  • Status changed from In Progress to Merged
  • translation missing: en.field_closed_date set to 2017-02-16 15:40:27.311347

Also available in: Atom PDF