Project

General

Profile

Bug #756

CUDA build does not correctly find cuda location

Added by Michael Robson about 4 years ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Category:
GPU Support
Target version:
-
Start date:
05/29/2015
Due date:
% Done:

60%

Estimated time:
6.00 h

Description

tl;dr we need to change how the build script detects cuda's location OR tell people to ensure that CUDATOOLKIT_HOME is set correctly (which doesn't seem to be a default env var for cuda)

Several builds on the campus cluster were failing when building with the cuda option. I had loaded cuda using:

module load cuda

but the build was still failing. I noticed that at the top of build's output the following line:
[mprobson@taubh2 charm]$ ./build charm++ netlrts-linux-x86_64 cuda smp                                                                                                                                      
checking for CUDA toolkit directory
CUDA_DIR=/usr/local/cuda/

With some grep handiwork:
[mprobson@taubh2 charm]$ grep -rn "checking for CUDA toolkit directory" *                                                                                                                                   
grep: VERSION: No such file or directory
build:451:  echo "checking for CUDA toolkit directory" 
grep: include: No such file or directory

And on line 451 of build:
451   echo "checking for CUDA toolkit directory" 
452   CUDA_CANDIDATE_DIRS="$CUDATOOLKIT_HOME /usr/local/cuda /usr/lib/nvidia-cuda-toolkit" 

Each of those dir's is checked for existence. If they exist then that's where CUDA_DIR is set to. The problem on the campus cluster is that each of the versions of cuda has their own subdir inside /usr/local/cuda/, e.g. /usr/loca/cuda/6.5. This causes the build script to misrecognize cuda and for the build to break. Current work around is to set the non-standard CUDATOOLKIT_HOME env var and then build. I'm ultimately not sure if we need to change the line in build or if we should just ensure users/vendors make sure that variable is set.


Related issues

Related to Charm++ - Bug #881: Automatically determine location of nvcc when compiling programs using charmc in accel New 11/09/2015

History

#1 Updated by Michael Robson almost 4 years ago

Add the bit about CUDATOOLKIT_HOME to the manual either in the debugging or (newly created) troubleshooting section

#2 Updated by Michael Robson almost 4 years ago

  • Target version changed from 6.7.0 to Unscheduled

#3 Updated by Michael Robson almost 4 years ago

  • Target version deleted (Unscheduled)

#4 Updated by Eric Bohm almost 3 years ago

is this still a problem?

#5 Updated by Sam White over 2 years ago

  • Category set to GPU Support
  • Target version set to 6.8.1

Any update?

#6 Updated by Eric Bohm almost 2 years ago

  • Target version changed from 6.8.1 to 6.9.0

#8 Updated by Sam White over 1 year ago

What does that documentation patch have to do with this issue?

#9 Updated by Sam White over 1 year ago

  • Target version deleted (6.9.0)

Not a release blocker

#10 Updated by Michael Robson over 1 year ago

Sam White wrote:

What does that documentation patch have to do with this issue?

From the Description:

tl;dr we need to change how the build script detects cuda's location OR tell people to ensure that CUDATOOLKIT_HOME is set correctly

So this is my first cut fix, i.e. document the problem in the manual. I plan to do the other half of that or as well. One suggestion that I'm currently working on is making the build fail more obviously when it doesn't find cuda.

Also available in: Atom PDF