[NLPL Task Force (A)] [uninett.no #196625] Issues with TensorFlow on Saga
Vinit Ravishankar via RT
support at metacenter.no
Sun Oct 20 20:34:18 UTC 2019
Yeah, I did try installing horovod myself and while it needed additional flags, it looks like it’s working now. Thanks!
– Vinit
> On 20 Oct 2019, at 21:24, Thomas Röblitz via RT <support at metacenter.no> wrote:
>
> No problem. This solves your problems 1) and 2) ?
>
> Concerning horovod, could you try to install this yourself? According to https://github.com/horovod/horovod#id7 it may be doable. Didn't try myself yet.
>
> Enjoy your evening
>
> Thomas
>
> Am So 20. Okt 20:56:58 2019, vinitr at ifi.uio.no schrieb:
>> Right, turns out this was my fault — I was being extremely stupid and
>> running this in a shell, which meant it was running on CPU :-) gave it
>> a shot and it seems to work, thanks!
>>
>> – Vinit
>>
>>> On 20 Oct 2019, at 20:45, Thomas Röblitz via RT
>>> <support at metacenter.no> wrote:
>>>
>>> Hei Vinit,
>>>
>>> I'm lacking a bit detail here to reproduce the behaviour you
>>> experience. For example, when I do
>>>
>>> # module load TensorFlow/1.13.1-fosscuda-2018b-Python-3.6.6
>>> # which python
>>> /cluster/software/Python/3.6.6-fosscuda-2018b/bin/python
>>> # salloc --account=nn9999k --time=02:00:00 --nodes=1
>>> --partition=accel --gres=gpu:1 --ntasks-per-node=1 --mem=32G
>>>
>>> I get an interactive job on one of the GPU nodes. Then, when I start
>>> python via
>>>
>>> # srun --pty python
>>> Python 3.6.6 (default, Aug 9 2019, 16:46:08)
>>> [GCC 7.3.0] on linux
>>> Type "help", "copyright", "credits" or "license" for more
>>> information.
>>>>>> import tensorflow as tf
>>>>>>
>>>
>>> it seems to work (at least no error messages).
>>>
>>> So, likely you do something different. If you provide more details,
>>> e.g., sequence of commands until you get to the error messages, I can
>>> have a look into the problem.
>>>
>>> Best regards
>>>
>>> Thomas
>>>
>>> Am So 20. Okt 13:45:02 2019, vinitr at ifi.uio.no schrieb:
>>>> Hi all,
>>>>
>>>> I’ve been having some issues getting (other people’s) projects in
>>>> TensorFlow to run on GPU. There’s two scenarios here:
>>>>
>>>> 1. My own anaconda environment with TensorFlow installed manually
>>>> (this works fine for PyTorch, and, indeed, is my normal workflow):
>>>> ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found
>>>>
>>>> 2. Using the tensorflow module: ImportError: libcuda.so.1: cannot
>>>> open
>>>> shared object file: No such file or directory
>>>>
>>>> (2) is despite CUDA being loaded by the module (as far as I can
>>>> tell,
>>>> anyway).
>>>>
>>>> How do I solve this? Additionally, it would also be cool to get
>>>> multi-
>>>> GPU support with Horovod (https://github.com/horovod/horovod),
>>>> something I don’t believe works at the moment.
>>>>
>>>> Thanks!
>>>>
>>>> – Vinit
>>>>
>>>
>>>
>>>
>>
>
>
>
More information about the infrastructure
mailing list