Project

General

Profile

Bug #234

net-linux-x86_64-*-smp-pgcc crashes in megatest

Added by Phil Miller about 6 years ago. Updated over 2 years ago.

Status:
Rejected
Priority:
Low
Assignee:
Category:
-
Target version:
Start date:
06/17/2013
Due date:
% Done:

0%


Description

http://charm.cs.illinois.edu/autobuild/old.2013_06_05__03_33/net-linux-x86_64-ibverbs-smp-pgcc.txt
The PGI compiler isn't happy with something we're doing. Our problem, or its?

Schedule-wise, 6.5.1 or 6.6, or never?

History

#1 Updated by Phil Miller about 6 years ago

  • Assignee set to Eric Bohm

#2 Updated by Phil Miller about 6 years ago

  • Priority changed from Normal to Urgent

#3 Updated by Eric Bohm about 6 years ago

I don't have an account on Trestles (machine has always been too small to matter for projects I work on) and neither Stampede nor Taub have PGI as far as I can tell. So I don't have access to an infiniband machine with PGI on it to work on this bug.

However, based on the rather dismal autobuild config log for trestles pgcc verbs smp:
checking "whether asm eieio assembly works"... "no"
checking "whether _thread (Thread Local Storage) is supported"... "no"
checking "whether synchronization primitives (
_sync_add_and_fetch) works in C"... "no"
checking "whether synchronization primitives (_sync_synchronize) works in C"... "no"
checking "whether fence intrinsic primitives (
_builtin_Xfence_ia32) works in C"... "no"
checking "whether switching TLS register (64-bit) is supported"... "yes"

SMP performance is probably going to be underwhelming even if we fixed it to work correctly under those conditions.

#4 Updated by Phil Miller about 6 years ago

You skipped the couple lines before it, using inline assembly

checking "whether GCC x86 assembly works"... "yes" 
checking "whether GCC x86 assembly for atomic increment works"... "yes" 

The performance of those should be just about the same as the compiler intrinsics.

#5 Updated by Eric Bohm about 6 years ago

Yes, but I am suspicious of the use of those being correct in PGI.

#6 Updated by Phil Miller about 6 years ago

How about TACC's Lonestar?

#7 Updated by Eric Bohm about 6 years ago

  • Subject changed from net-linux-x86_64-ibverbs-smp-pgcc crashes in megatest to net-linux-x86_64-*-smp-pgcc crashes in megatest

{I don't have a lonestar account, but I was able to investigate trestles via buildcharm}

The bug appears to be unrelated to ibverbs. (subject line changed accordingly) I can produce the same problem in net-linux-x86_64-smp-pgcc. It persists across -memory [os|ptmalloc|gnu] and -thread [generic|context|uJcontext]. GCC is not afflicted with this problem.

On a related note, Trestles has a rather antiquated (May 2010) version of pgcc 10.5-0. However, trying a newer PGI installation elsewhere (such as on BlueWaters) just moves the problem from runtime to compile time. As PGI 13.3-0 crashes during compilation of ckarray.C with:

PGCC-S-0000-Internal compiler error. union_find_last_lp_per_handler:empty throw_bih 0 (ckarray.C: 635)

This exhausts my patience with PGI at this time.

Recommended workaround: compile using ICC, or GCC. Unless you're a masochist, in which case feel free to try compiling with PGI while standing on broken glass.

#8 Updated by Phil Miller about 6 years ago

An important note on that finding: the ICE requires that optimization of -O2 or -O3 be enabled - it works with -O1. I've also seen that it occurs with both 13.3 and 13.4.

I was about to test 12.x, when the login nodes suddenly hung. I'll take another look later.

#9 Updated by Phil Miller about 6 years ago

I just tried 12.8 and 12.10 - ICE is not present.

#10 Updated by Eric Bohm about 6 years ago

  • Target version changed from 6.5.1 to 6.6.0

Reported to NCSA Jira. They noted the similarity with: http://www.pgroup.com/userforum/viewtopic.php?t=3885

So, it might be fixed in 13.6.0, which might see the light of day on a system we have access to eventually.

Shifting target to 6.6.0, since we're unlikely to have a solution to the original issue on Trestles any time soon (perhaps never if they don't upgrade PGI).

#11 Updated by Phil Miller about 6 years ago

  • Priority changed from Urgent to Normal

#12 Updated by Eric Bohm almost 6 years ago

  • Status changed from New to In Progress

PGCC should not be tested on the old version available on Trestles. Target will shift to Hopper when we have smp debugged.

#13 Updated by Eric Bohm almost 6 years ago

  • Target version changed from 6.6.0 to Unscheduled

Given that the releases over the past few years either fail to compile, fail to link, or generate code that segfaults, I feel that we shouldn't regard PGI as a production compiler for Charm++. It is not worth our time unless one of our collaborators really needs PGI for some reason.

#14 Updated by Eric Bohm over 4 years ago

  • Priority changed from Normal to Low

#15 Updated by Sam White over 2 years ago

  • Status changed from In Progress to Rejected
  • translation missing: en.field_closed_date set to 2017-02-01 12:40:31.381713

closing due to net- being deprecated and pgcc not generally being able to compile Charm (and the lack of requests from users for PGI support)

Also available in: Atom PDF