[NLPL Task Force (A)] [uninett.no #211080] NCCL on Saga (for use with CUDA 10.2)

Vegard Eide via RT metacenter-software at metacenter.no
Wed May 13 06:32:17 UTC 2020


ti. 12. mai 2020 14.31.07 skrev oe at ifi.uio.no:

    > nccl_2.5.6-2+cuda10.2_x86_64.txz
    > nccl_2.5.6-1+cuda10.1_x86_64.txz

    i see.  i guess no matter what they say in the release notes, that
    should be sufficient reason to install separate modules for each
    download that is actually needed.  in fact, it appears these are
    indeed distinct builds:

    $ diff -iwr nccl_2.6.4-1+cuda10.*
    Binary files nccl_2.6.4-1+cuda10.0_x86_64/lib/libnccl.so and
    nccl_2.6.4-1+cuda10.2_x86_64/lib/libnccl.so differ
    Binary files nccl_2.6.4-1+cuda10.0_x86_64/lib/libnccl.so.2 and
    nccl_2.6.4-1+cuda10.2_x86_64/lib/libnccl.so.2 differ
    Binary files nccl_2.6.4-1+cuda10.0_x86_64/lib/libnccl.so.2.6.4 and
    nccl_2.6.4-1+cuda10.2_x86_64/lib/libnccl.so.2.6.4 differ
    Binary files nccl_2.6.4-1+cuda10.0_x86_64/lib/libnccl_static.a and
    nccl_2.6.4-1+cuda10.2_x86_64/lib/libnccl_static.a differ

    for me, just now, i believe i would like to try NCCL 2.6.4 (the
    current version) on CUDA 10.2 (default requirement for current
    PyTorch) and 10.1 (for current TensorFlow), so hopefully i can make do
    (for the time being :-) with just two modules!



Hi,

We have installed
 

NCCL/2.6.4-CUDA-10.1
NCCL/2.6.4-CUDA-10.2


Notice, loading the modules will not directly load any CUDA module since they
can be used with different CUDA 10.1.x and 10.2.x modules respectively.

Regards
Vegard




More information about the infrastructure mailing list