[NLPL Task Force (A)] [uninett.no #196625] Issues with TensorFlow on Saga
Thomas Röblitz via RT
support at metacenter.no
Mon Oct 21 07:27:50 UTC 2019
Thanks!
Closing case
Thomas
Am So 20. Okt 22:36:30 2019, vinitr at ifi.uio.no schrieb:
> Did it with HOROVOD_WITH_TENSORFLOW=1
>
> – Vinit
>
> > On 20 Oct 2019, at 22:35, Thomas Röblitz via RT
> > <support at metacenter.no> wrote:
> >
> > Great! Just for the record, could you let me know what additional
> > flags you needed?
> >
> > Thomas
> >
> > Am So 20. Okt 22:34:16 2019, vinitr at ifi.uio.no schrieb:
> >> Yeah, I did try installing horovod myself and while it needed
> >> additional flags, it looks like it’s working now. Thanks!
> >>
> >> – Vinit
> >>
> >>> On 20 Oct 2019, at 21:24, Thomas Röblitz via RT
> >>> <support at metacenter.no> wrote:
> >>>
> >>> No problem. This solves your problems 1) and 2) ?
> >>>
> >>> Concerning horovod, could you try to install this yourself?
> >>> According
> >>> to https://github.com/horovod/horovod#id7 it may be doable. Didn't
> >>> try myself yet.
> >>>
> >>> Enjoy your evening
> >>>
> >>> Thomas
> >>>
> >>> Am So 20. Okt 20:56:58 2019, vinitr at ifi.uio.no schrieb:
> >>>> Right, turns out this was my fault — I was being extremely stupid
> >>>> and
> >>>> running this in a shell, which meant it was running on CPU :-)
> >>>> gave
> >>>> it
> >>>> a shot and it seems to work, thanks!
> >>>>
> >>>> – Vinit
> >>>>
> >>>>> On 20 Oct 2019, at 20:45, Thomas Röblitz via RT
> >>>>> <support at metacenter.no> wrote:
> >>>>>
> >>>>> Hei Vinit,
> >>>>>
> >>>>> I'm lacking a bit detail here to reproduce the behaviour you
> >>>>> experience. For example, when I do
> >>>>>
> >>>>> # module load TensorFlow/1.13.1-fosscuda-2018b-Python-3.6.6
> >>>>> # which python
> >>>>> /cluster/software/Python/3.6.6-fosscuda-2018b/bin/python
> >>>>> # salloc --account=nn9999k --time=02:00:00 --nodes=1
> >>>>> --partition=accel --gres=gpu:1 --ntasks-per-node=1 --mem=32G
> >>>>>
> >>>>> I get an interactive job on one of the GPU nodes. Then, when I
> >>>>> start
> >>>>> python via
> >>>>>
> >>>>> # srun --pty python
> >>>>> Python 3.6.6 (default, Aug 9 2019, 16:46:08)
> >>>>> [GCC 7.3.0] on linux
> >>>>> Type "help", "copyright", "credits" or "license" for more
> >>>>> information.
> >>>>>>>> import tensorflow as tf
> >>>>>>>>
> >>>>>
> >>>>> it seems to work (at least no error messages).
> >>>>>
> >>>>> So, likely you do something different. If you provide more
> >>>>> details,
> >>>>> e.g., sequence of commands until you get to the error messages, I
> >>>>> can
> >>>>> have a look into the problem.
> >>>>>
> >>>>> Best regards
> >>>>>
> >>>>> Thomas
> >>>>>
> >>>>> Am So 20. Okt 13:45:02 2019, vinitr at ifi.uio.no schrieb:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I’ve been having some issues getting (other people’s) projects
> >>>>>> in
> >>>>>> TensorFlow to run on GPU. There’s two scenarios here:
> >>>>>>
> >>>>>> 1. My own anaconda environment with TensorFlow installed
> >>>>>> manually
> >>>>>> (this works fine for PyTorch, and, indeed, is my normal
> >>>>>> workflow):
> >>>>>> ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found
> >>>>>>
> >>>>>> 2. Using the tensorflow module: ImportError: libcuda.so.1:
> >>>>>> cannot
> >>>>>> open
> >>>>>> shared object file: No such file or directory
> >>>>>>
> >>>>>> (2) is despite CUDA being loaded by the module (as far as I can
> >>>>>> tell,
> >>>>>> anyway).
> >>>>>>
> >>>>>> How do I solve this? Additionally, it would also be cool to get
> >>>>>> multi-
> >>>>>> GPU support with Horovod (https://github.com/horovod/horovod),
> >>>>>> something I don’t believe works at the moment.
> >>>>>>
> >>>>>> Thanks!
> >>>>>>
> >>>>>> – Vinit
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>
> >
> >
> >
>
More information about the infrastructure
mailing list