[NLPL Task Force (A)] [uninett.no #196625] Issues with TensorFlow on Saga
Thomas Röblitz via RT
support at metacenter.no
Sun Oct 20 19:24:38 UTC 2019
No problem. This solves your problems 1) and 2) ?
Concerning horovod, could you try to install this yourself? According to https://github.com/horovod/horovod#id7 it may be doable. Didn't try myself yet.
Enjoy your evening
Thomas
Am So 20. Okt 20:56:58 2019, vinitr at ifi.uio.no schrieb:
> Right, turns out this was my fault — I was being extremely stupid and
> running this in a shell, which meant it was running on CPU :-) gave it
> a shot and it seems to work, thanks!
>
> – Vinit
>
> > On 20 Oct 2019, at 20:45, Thomas Röblitz via RT
> > <support at metacenter.no> wrote:
> >
> > Hei Vinit,
> >
> > I'm lacking a bit detail here to reproduce the behaviour you
> > experience. For example, when I do
> >
> > # module load TensorFlow/1.13.1-fosscuda-2018b-Python-3.6.6
> > # which python
> > /cluster/software/Python/3.6.6-fosscuda-2018b/bin/python
> > # salloc --account=nn9999k --time=02:00:00 --nodes=1
> > --partition=accel --gres=gpu:1 --ntasks-per-node=1 --mem=32G
> >
> > I get an interactive job on one of the GPU nodes. Then, when I start
> > python via
> >
> > # srun --pty python
> > Python 3.6.6 (default, Aug 9 2019, 16:46:08)
> > [GCC 7.3.0] on linux
> > Type "help", "copyright", "credits" or "license" for more
> > information.
> >>>> import tensorflow as tf
> >>>>
> >
> > it seems to work (at least no error messages).
> >
> > So, likely you do something different. If you provide more details,
> > e.g., sequence of commands until you get to the error messages, I can
> > have a look into the problem.
> >
> > Best regards
> >
> > Thomas
> >
> > Am So 20. Okt 13:45:02 2019, vinitr at ifi.uio.no schrieb:
> >> Hi all,
> >>
> >> I’ve been having some issues getting (other people’s) projects in
> >> TensorFlow to run on GPU. There’s two scenarios here:
> >>
> >> 1. My own anaconda environment with TensorFlow installed manually
> >> (this works fine for PyTorch, and, indeed, is my normal workflow):
> >> ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found
> >>
> >> 2. Using the tensorflow module: ImportError: libcuda.so.1: cannot
> >> open
> >> shared object file: No such file or directory
> >>
> >> (2) is despite CUDA being loaded by the module (as far as I can
> >> tell,
> >> anyway).
> >>
> >> How do I solve this? Additionally, it would also be cool to get
> >> multi-
> >> GPU support with Horovod (https://github.com/horovod/horovod),
> >> something I don’t believe works at the moment.
> >>
> >> Thanks!
> >>
> >> – Vinit
> >>
> >
> >
> >
>
More information about the infrastructure
mailing list