[NLPL Task Force (A)] NLPL modules python issue

Andrei Kutuzov andreku at ifi.uio.no
Thu Jan 23 17:10:17 UTC 2020


Hi Stephan,

In the end, it was too early to celebrate. Indeed, I can run python3
from the nlpl-tensorflow/1.15.0/3.7 module now.

But it seems that TF in this module was compiled a different CuDNN
version than the one used currently on the Saga GPU nodes.

The result is that it is impossible to run GPU jobs with this module. TF
first produces the following warnings:

2020-01-23 17:55:36.543187: E
tensorflow/stream_executor/cuda/cuda_dnn.cc:319] Loaded runtime CuDNN
library: 7.4.2 but source was compiled with: 7.6.0.  CuDNN library major
and minor version needs to match or have higher minor version in case of
CuDNN 7.0 or later version. If using a binary install, upgrade your
CuDNN library.  If building from sources, make sure the library loaded
at runtime is compatible with the version specified during compile
configuration.

...and then it fails like this:

tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably
because cuDNN failed to initialize, so try looking to see if a warning
log message was printed above.


I guess, TF should be compiled with CuDNN 7.4.2 in order to work
properly on Saga.


On 1/13/20 10:10 AM, Sara Stymne wrote:
> Hi Stephan,
> 
> Yes, it seems to work fine for me as well. I can load and run nlpl-uuparser/2.3.1, which also loads many of the other modules, and python seems to work fine.
> 
> Thanks for resolving this so quickly!
> 
> Best,
> Sara
> 
> 
> 12 jan 2020 kl. 20:34 skrev Andrei Kutuzov <andreku at ifi.uio.no>
> :
> 
>> Hi Stephan,
>>
>> Yes, I can confirm that at least for me this works. I can now run
>> python3 from the nlpl-tensorflow/1.15.0/3.7 module.
>>
>> Thanks for resolving this!
>>
>> 12.01.2020 4:26, Stephan Oepen wrote:
>>> hi again, sara, andrey, all,
>>>
>>> i believe i managed to track down this problem and was relieved to see
>>> it is a recently introduced issue: the NLPL binaries for these Python
>>> add-on modules had inadvertently had their set-group-id bit ('g+s')
>>> set, which i am pretty sure was the result of a major recursive
>>> adjustment of file permissions right before the holidays.  this bit
>>> (probably) should be set on directories, where it will cause the group
>>> owner to be inherited onto new sub-directories or files; but on
>>> executable files (run by anyone but me or root), it actually caused a
>>> loss of privileges that prevented the search for the base shared
>>> libraries.  note to self: this was tedious to debug, because the
>>> problem goes away when running in the scope of strace(1); it turns
>>> out, strace(1) prevents setuid(2) and setgid(2) execution ...
>>>
>>> sara and andrey, please try again.  i hope the NLPL add-on modules are
>>> back to normal now?
>>>
>>> all best, oe
>>>
>>> On Fri, Jan 10, 2020 at 11:52 AM Sara Stymne <sara.stymne at lingfil.uu.se> wrote:
>>>>
>>>> No, neither of us had tried it before. I think it might have worked for Ali, but I'm not sure.
>>>>
>>>>
>>>> Best,
>>>>
>>>> Sara
>>>>
>>>>
>>>> ________________________________
>>>> Från: Stephan Oepen <oe at ifi.uio.no>
>>>> Skickat: den 10 januari 2020 11:50:14
>>>> Till: Sara Stymne
>>>> Kopia: Martin Matthiesen; Ali Basirat; infrastructure
>>>> Ämne: Re: [NLPL Task Force (A)] NLPL modules python issue
>>>>
>>>>> We tried it on both my and Artur's accounts here, and had the same issue.
>>>>
>>>>>> python: error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory
>>>>
>>>> had either of you tried before (in other words, is this a recent
>>>> problem)?  i installed most of these modules last november, but cannot
>>>> know how many people have tried using them (i believe i know for sure
>>>> several of them work for vinit and yves) ...
>>>>
>>>> oe
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
>>>>
>>>> E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
>>
>>
>> -- 
>> Andrei
>> PhD Candidate at Language Technology Group (LTG)
>> University of Oslo
> 


-- 
Andrei
PhD Candidate at Language Technology Group (LTG)
University of Oslo



More information about the infrastructure mailing list