[NLPL Task Force (A)] [uninett.no #196625] Issues with TensorFlow on Saga

Thomas Röblitz via RT support at metacenter.no
Sun Oct 20 18:45:46 UTC 2019


Hei Vinit,

I'm lacking a bit detail here to reproduce the behaviour you experience. For example, when I do

# module load TensorFlow/1.13.1-fosscuda-2018b-Python-3.6.6 
# which python
/cluster/software/Python/3.6.6-fosscuda-2018b/bin/python
# salloc --account=nn9999k --time=02:00:00 --nodes=1 --partition=accel --gres=gpu:1 --ntasks-per-node=1 --mem=32G

I get an interactive job on one of the GPU nodes. Then, when I start python via

# srun --pty python
Python 3.6.6 (default, Aug  9 2019, 16:46:08) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> 

it seems to work (at least no error messages).

So, likely you do something different. If you provide more details, e.g., sequence of commands until you get to the error messages, I can have a look into the problem.

Best regards

Thomas

Am So 20. Okt 13:45:02 2019, vinitr at ifi.uio.no schrieb:
> Hi all,
> 
> I’ve been having some issues getting (other people’s) projects in
> TensorFlow to run on GPU. There’s two scenarios here:
> 
> 1. My own anaconda environment with TensorFlow installed manually
> (this works fine for PyTorch, and, indeed, is my normal workflow):
> ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found
> 
> 2. Using the tensorflow module: ImportError: libcuda.so.1: cannot open
> shared object file: No such file or directory
> 
> (2) is despite CUDA being loaded by the module (as far as I can tell,
> anyway).
> 
> How do I solve this? Additionally, it would also be cool to get multi-
> GPU support with Horovod (https://github.com/horovod/horovod),
> something I don’t believe works at the moment.
> 
> Thanks!
> 
> – Vinit
> 






More information about the infrastructure mailing list