Project

General

Profile

Bug #1782

charmrun ++local dies without error message on non-existent program

Added by Jim Phillips over 1 year ago. Updated over 1 year ago.

Status:
Merged
Priority:
Normal
Assignee:
Category:
Machine Layers
Target version:
Start date:
01/25/2018
Due date:
% Done:

100%


Description

Calling charmrun with a non-existent program produces a useful error message:

jim@belfast$/Projects/namd2/charm-6.9.0/netlrts-linux-x86_64-smp-iccstatic/bin/charmrun /foo
Charmrun> scalable start enabled. 
Charmrun remote shell(abijan.0)> Cannot locate this node-program: /foo
Charmrun remote shell(abijan.0)> Exiting with error code 1
Charmrun> Error 1 returned from remote shell (abijan:0)

Except with the ++local option:
jim@belfast$/Projects/namd2/charm-6.9.0/netlrts-linux-x86_64-smp-iccstatic/bin/charmrun ++local /foo
Charmrun> scalable start enabled. 
Killed

History

#1 Updated by Sam White over 1 year ago

  • Assignee set to Dong Hun Lee

#2 Updated by Dong Hun Lee over 1 year ago

  • % Done changed from 0 to 100
  • Status changed from New to Implemented

https://charm.cs.illinois.edu/gerrit/#/c/3575/

The error comes from that after execve in line 5368 returns, indicating that exec was unsuccesful, the following line's fprintf isn't printed.
I tried fflush but does not seem to work.
Instead, before spawning processes in PE's, a quick check is done to see if the program path is valid.
Original error messages are in ssh_script function (line 4507), but this isn't called when ++local option is used.

#3 Updated by Jim Phillips over 1 year ago

Passing a file that is not executable is also handled badly:

jim@sunnyvale$/Projects/namd2/charm-6.9.0/netlrts-linux-x86_64-smp-iccstatic/bin/charmrun ++local Make.config  
Charmrun> scalable start enabled. 
Killed

We need to figure out why an error message is not displayed.

#4 Updated by Jim Phillips over 1 year ago

Dong Hun Lee wrote:

https://charm.cs.illinois.edu/gerrit/#/c/3575/

The error comes from that after execve in line 5368 returns, indicating that exec was unsuccesful, the following line's fprintf isn't printed.
I tried fflush but does not seem to work.
Instead, before spawning processes in PE's, a quick check is done to see if the program path is valid.
Original error messages are in ssh_script function (line 4507), but this isn't called when ++local option is used.

See my comments in code review. The code dups stdout (1) instead of stderr (2).

#5 Updated by Eric Mikida over 1 year ago

  • Status changed from Implemented to Merged

Also available in: Atom PDF