[NLPL Task Force (A)] [uninett.no #196965] Tensorflow issues, pt. 2
Vinit Ravishankar
vinitr at ifi.uio.no
Fri Oct 25 11:06:21 UTC 2019
Incidentally, I’m still having issues loading just TensorFlow, if it’s v1.13.1.
Current setup: conda environment, python 3.6, installed tensorflow with pip (not conda), i.e.:
pip install tensorflow-gpu==1.13.1
Modules:
module load CUDA/10.0.130 cuDNN/7.4.2.24-CUDA-10.0.130
module --ignore-cache load NCCL/2.4.8-CUDA-10.0
Error:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
I’ll try again with TensorFlow installed with conda, but seeing as I had the same error with that version yesterday, I’m not expecting it to change. This is a fresh conda environment, so it ought not to have any CUDA paths that’d override the module ones, right?
– Vinit
> On 25 Oct 2019, at 13:01, oe at ifi.uio.no via RT <support at metacenter.no> wrote:
>
>> i am not sure that NCCL actually is dependent on a specific CUDA
>> version, but as henrik points out its module wrapper does load CUDA
>> versions that you probably do not want. i wonder whether it might
>> work to do 'surgical' replacement of modules:
>
> looking a little more at NCCL, it does sound as if it may be dependent
> on one specific CUDA version, at least there are different download
> packages for NCCL against CUDA 10.0 vs. 10.1. so maybe we need to ask
> for an additional module to be installed, something like
> NCCL/2.4.8-CUDA-10.0?
>
> henrik or thomas, if you agree that bifurcating NCCL according to CUDA
> versions will be required, could you see to the creation of such a
> module?
>
> cheers, oe
>
>
More information about the infrastructure
mailing list