Project

General

Profile

Bug #668

Bug #259: Bugs exposed by use of randomized Q

ampi/megampi test fails with randomized queues

Added by Eric Mikida almost 4 years ago. Updated 5 months ago.

Status:
In Progress
Priority:
Low
Assignee:
Category:
AMPI
Target version:
-
Start date:
02/11/2015
Due date:
% Done:

0%


Description

See parent task for full details.

History

#1 Updated by Eric Bohm almost 4 years ago

  • Assignee changed from PPL to Phil Miller

#2 Updated by Phil Miller over 3 years ago

Reproduced and captured record/replay logs. Will attempt to run under charmdebug to understand what goes wrong.

#3 Updated by Phil Miller over 2 years ago

  • Assignee changed from Phil Miller to Sam White

Passing off an AMPI bug

#4 Updated by Phil Miller over 2 years ago

Per the parent task,

./build AMPI net-linux-x86_64 --with-prio-type=int --enable-randomized-msgq -j16 --suffix randq-debug -O3 -g

ampi/megampi: Crashes rarely due to a failed assertion: Broadcast integer from master> expected 123, got 4!

It might be worthwhile to try the various mpich-test, imb, and other conformance tests under randomized queues. If we see more failures, those would be indicative of substantial robustness issues that we'll have to face, or subject users to potential unpredictable failures/wrong results.

#5 Updated by Sam White over 2 years ago

I built as above and ran megampi for 1000 iterations 10 times (>1 hour), and got no failures. None from mpich-tests/coll that I tried either. I can try IMB.

#6 Updated by Sam White over 2 years ago

  • Status changed from New to In Progress

#7 Updated by Sam White almost 2 years ago

  • Target version set to 6.8.1

#8 Updated by Sam White over 1 year ago

  • Target version changed from 6.8.1 to 6.9.0

#9 Updated by Sam White about 1 year ago

  • Target version deleted (6.9.0)

#10 Updated by Sam White 5 months ago

  • Priority changed from Normal to Low

Also available in: Atom PDF