[NLPL Task Force (A)] [uninett.no #196965] Tensorflow issues, pt. 2

Stephan Oepen oe at ifi.uio.no
Fri Oct 25 11:01:02 UTC 2019


> i am not sure that NCCL actually is dependent on a specific CUDA
> version, but as henrik points out its module wrapper does load CUDA
> versions that you probably do not want.  i wonder whether it might
> work to do 'surgical' replacement of modules:

looking a little more at NCCL, it does sound as if it may be dependent
on one specific CUDA version, at least there are different download
packages for NCCL against CUDA 10.0 vs. 10.1.  so maybe we need to ask
for an additional module to be installed, something like
NCCL/2.4.8-CUDA-10.0?

henrik or thomas, if you agree that bifurcating NCCL according to CUDA
versions will be required, could you see to the creation of such a
module?

cheers, oe



More information about the infrastructure mailing list