[NLPL Task Force (A)] [uninett.no #211080] NCCL on Saga (for use with CUDA 10.2)

oe@ifi.uio.no via RT metacenter-software at metacenter.no
Tue May 12 12:31:08 UTC 2020


> nccl_2.5.6-2+cuda10.2_x86_64.txz
> nccl_2.5.6-1+cuda10.1_x86_64.txz

i see.  i guess no matter what they say in the release notes, that
should be sufficient reason to install separate modules for each
download that is actually needed.  in fact, it appears these are
indeed distinct builds:

$ diff -iwr nccl_2.6.4-1+cuda10.*
Binary files nccl_2.6.4-1+cuda10.0_x86_64/lib/libnccl.so and
nccl_2.6.4-1+cuda10.2_x86_64/lib/libnccl.so differ
Binary files nccl_2.6.4-1+cuda10.0_x86_64/lib/libnccl.so.2 and
nccl_2.6.4-1+cuda10.2_x86_64/lib/libnccl.so.2 differ
Binary files nccl_2.6.4-1+cuda10.0_x86_64/lib/libnccl.so.2.6.4 and
nccl_2.6.4-1+cuda10.2_x86_64/lib/libnccl.so.2.6.4 differ
Binary files nccl_2.6.4-1+cuda10.0_x86_64/lib/libnccl_static.a and
nccl_2.6.4-1+cuda10.2_x86_64/lib/libnccl_static.a differ

for me, just now, i believe i would like to try NCCL 2.6.4 (the
current version) on CUDA 10.2 (default requirement for current
PyTorch) and 10.1 (for current TensorFlow), so hopefully i can make do
(for the time being :-) with just two modules!

best wishes, oe





More information about the infrastructure mailing list