[NLPL Task Force (A)] [uninett.no #196625] Issues with TensorFlow on Saga
Thomas Röblitz via RT
support at metacenter.no
Sun Oct 20 20:35:40 UTC 2019
Great! Just for the record, could you let me know what additional flags you needed?
Thomas
Am So 20. Okt 22:34:16 2019, vinitr at ifi.uio.no schrieb:
> Yeah, I did try installing horovod myself and while it needed
> additional flags, it looks like it’s working now. Thanks!
>
> – Vinit
>
> > On 20 Oct 2019, at 21:24, Thomas Röblitz via RT
> > <support at metacenter.no> wrote:
> >
> > No problem. This solves your problems 1) and 2) ?
> >
> > Concerning horovod, could you try to install this yourself? According
> > to https://github.com/horovod/horovod#id7 it may be doable. Didn't
> > try myself yet.
> >
> > Enjoy your evening
> >
> > Thomas
> >
> > Am So 20. Okt 20:56:58 2019, vinitr at ifi.uio.no schrieb:
> >> Right, turns out this was my fault — I was being extremely stupid
> >> and
> >> running this in a shell, which meant it was running on CPU :-) gave
> >> it
> >> a shot and it seems to work, thanks!
> >>
> >> – Vinit
> >>
> >>> On 20 Oct 2019, at 20:45, Thomas Röblitz via RT
> >>> <support at metacenter.no> wrote:
> >>>
> >>> Hei Vinit,
> >>>
> >>> I'm lacking a bit detail here to reproduce the behaviour you
> >>> experience. For example, when I do
> >>>
> >>> # module load TensorFlow/1.13.1-fosscuda-2018b-Python-3.6.6
> >>> # which python
> >>> /cluster/software/Python/3.6.6-fosscuda-2018b/bin/python
> >>> # salloc --account=nn9999k --time=02:00:00 --nodes=1
> >>> --partition=accel --gres=gpu:1 --ntasks-per-node=1 --mem=32G
> >>>
> >>> I get an interactive job on one of the GPU nodes. Then, when I
> >>> start
> >>> python via
> >>>
> >>> # srun --pty python
> >>> Python 3.6.6 (default, Aug 9 2019, 16:46:08)
> >>> [GCC 7.3.0] on linux
> >>> Type "help", "copyright", "credits" or "license" for more
> >>> information.
> >>>>>> import tensorflow as tf
> >>>>>>
> >>>
> >>> it seems to work (at least no error messages).
> >>>
> >>> So, likely you do something different. If you provide more details,
> >>> e.g., sequence of commands until you get to the error messages, I
> >>> can
> >>> have a look into the problem.
> >>>
> >>> Best regards
> >>>
> >>> Thomas
> >>>
> >>> Am So 20. Okt 13:45:02 2019, vinitr at ifi.uio.no schrieb:
> >>>> Hi all,
> >>>>
> >>>> I’ve been having some issues getting (other people’s) projects in
> >>>> TensorFlow to run on GPU. There’s two scenarios here:
> >>>>
> >>>> 1. My own anaconda environment with TensorFlow installed manually
> >>>> (this works fine for PyTorch, and, indeed, is my normal workflow):
> >>>> ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found
> >>>>
> >>>> 2. Using the tensorflow module: ImportError: libcuda.so.1: cannot
> >>>> open
> >>>> shared object file: No such file or directory
> >>>>
> >>>> (2) is despite CUDA being loaded by the module (as far as I can
> >>>> tell,
> >>>> anyway).
> >>>>
> >>>> How do I solve this? Additionally, it would also be cool to get
> >>>> multi-
> >>>> GPU support with Horovod (https://github.com/horovod/horovod),
> >>>> something I don’t believe works at the moment.
> >>>>
> >>>> Thanks!
> >>>>
> >>>> – Vinit
> >>>>
> >>>
> >>>
> >>>
> >>
> >
> >
> >
>
More information about the infrastructure
mailing list