Add 'ofi' target to autobuild
#7 Updated by Jaemin Choi over 1 year ago
I've been working on getting it running on golub, but there is an issue with using ++mpiexec, where if you set ppn in qsub larger than the total number of PEs used in a test program it uses only a single physical node. This is problematic with SMP buidls, and it affects the verbs autobuilds as well. I've found a workaround by setting ppn to 1 in qsub and create a nodelist out of PBS_NODEFILE and using that, so I'll go forward with it.
#14 Updated by Sam White over 1 year ago
- Status changed from Resolved to In Progress
Autobuild for OFI is not passing yet. I don't even think it has actually gotten to building charm yet. Here's last night's run's failure:
./instead_test.sh: line 15: cd: charm/ofi-linux-x86_64/tmp: No such file or directory
#16 Updated by Sam White over 1 year ago
The build works, but then the jobs are pretty consistently timing out for whatever reason now:
In testdir charm/ofi-linux-x86_64/tmp Submitting batch job for> make test OPTS= using the command> sbatch /home/skk3/autobuild/ofi/charmrun_script.31865.sh Job enqueued under job ID 2272723 Job in state Job in state RUNNING Job in state RUNNING Job in state RUNNING ... ... Job in state TIMEOUT Job in state TIMEOUT Job in state TIMEOUT ... ...
#18 Updated by Jaemin Choi over 1 year ago
The issue of "There seems to be an issue with the OFI build that
+p1 passed to an application is regarded as
argv, and the pingpong benchmark (
./pgm +p1 hangs as it tries to use
+p1 as the payload which is ultimately set to 0.", which I thought was resolved by https://charm.cs.illinois.edu/gerrit/#/c/3452/, seems to have resurfaced.
#19 Updated by Jaemin Choi over 1 year ago
Actually the problem this time doesn't seem to be caused from
+p1; the command that causes the hang is
../../../bin/testrun ./pgm +p1 ++timeout 180 +isomalloc_sync, and the
++timeout 180 part is the culprit (so removing this works). But I think this problem happens whenever something not parsable is passed, because even
+timeout 180 causes the same hang. And the same thing if I use
charmrun instead of