[NLPL Task Force (A)] NLPL modules python issue
Andrei Kutuzov
andreku at ifi.uio.no
Mon Jan 27 10:04:40 UTC 2020
Hi Stephan,
The correct version of CuDNN is now installed on Saga, and I can load
it, but there is another problem. It seems that all these three modules
are dependent on GCCcore/8.2.0:
nlpl-python-candy/201912/3.7 nlpl-scipy/201910/3.7
nlpl-tensorflow/1.15.0/3.7
As far as I can tell, this makes them incompatible with the new CUDA
modules, which use GCCcore/8.3.0.
Is it possible to re-compile the NLPL modules with GCC 8.3?
23.01.2020 18:10, Andrei Kutuzov wrote:
> Hi Stephan,
>
> In the end, it was too early to celebrate. Indeed, I can run python3
> from the nlpl-tensorflow/1.15.0/3.7 module now.
>
> But it seems that TF in this module was compiled a different CuDNN
> version than the one used currently on the Saga GPU nodes.
>
> The result is that it is impossible to run GPU jobs with this module. TF
> first produces the following warnings:
>
> 2020-01-23 17:55:36.543187: E
> tensorflow/stream_executor/cuda/cuda_dnn.cc:319] Loaded runtime CuDNN
> library: 7.4.2 but source was compiled with: 7.6.0. CuDNN library major
> and minor version needs to match or have higher minor version in case of
> CuDNN 7.0 or later version. If using a binary install, upgrade your
> CuDNN library. If building from sources, make sure the library loaded
> at runtime is compatible with the version specified during compile
> configuration.
>
> ...and then it fails like this:
>
> tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
> (0) Unknown: Failed to get convolution algorithm. This is probably
> because cuDNN failed to initialize, so try looking to see if a warning
> log message was printed above.
>
>
> I guess, TF should be compiled with CuDNN 7.4.2 in order to work
> properly on Saga.
>
>
> On 1/13/20 10:10 AM, Sara Stymne wrote:
>> Hi Stephan,
>>
>> Yes, it seems to work fine for me as well. I can load and run nlpl-uuparser/2.3.1, which also loads many of the other modules, and python seems to work fine.
>>
>> Thanks for resolving this so quickly!
>>
>> Best,
>> Sara
>>
>>
>> 12 jan 2020 kl. 20:34 skrev Andrei Kutuzov <andreku at ifi.uio.no>
>> :
>>
>>> Hi Stephan,
>>>
>>> Yes, I can confirm that at least for me this works. I can now run
>>> python3 from the nlpl-tensorflow/1.15.0/3.7 module.
>>>
>>> Thanks for resolving this!
>>>
>>> 12.01.2020 4:26, Stephan Oepen wrote:
>>>> hi again, sara, andrey, all,
>>>>
>>>> i believe i managed to track down this problem and was relieved to see
>>>> it is a recently introduced issue: the NLPL binaries for these Python
>>>> add-on modules had inadvertently had their set-group-id bit ('g+s')
>>>> set, which i am pretty sure was the result of a major recursive
>>>> adjustment of file permissions right before the holidays. this bit
>>>> (probably) should be set on directories, where it will cause the group
>>>> owner to be inherited onto new sub-directories or files; but on
>>>> executable files (run by anyone but me or root), it actually caused a
>>>> loss of privileges that prevented the search for the base shared
>>>> libraries. note to self: this was tedious to debug, because the
>>>> problem goes away when running in the scope of strace(1); it turns
>>>> out, strace(1) prevents setuid(2) and setgid(2) execution ...
>>>>
>>>> sara and andrey, please try again. i hope the NLPL add-on modules are
>>>> back to normal now?
>>>>
>>>> all best, oe
>>>>
>>>> On Fri, Jan 10, 2020 at 11:52 AM Sara Stymne <sara.stymne at lingfil.uu.se> wrote:
>>>>>
>>>>> No, neither of us had tried it before. I think it might have worked for Ali, but I'm not sure.
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Sara
>>>>>
>>>>>
>>>>> ________________________________
>>>>> Från: Stephan Oepen <oe at ifi.uio.no>
>>>>> Skickat: den 10 januari 2020 11:50:14
>>>>> Till: Sara Stymne
>>>>> Kopia: Martin Matthiesen; Ali Basirat; infrastructure
>>>>> Ämne: Re: [NLPL Task Force (A)] NLPL modules python issue
>>>>>
>>>>>> We tried it on both my and Artur's accounts here, and had the same issue.
>>>>>
>>>>>>> python: error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory
>>>>>
>>>>> had either of you tried before (in other words, is this a recent
>>>>> problem)? i installed most of these modules last november, but cannot
>>>>> know how many people have tried using them (i believe i know for sure
>>>>> several of them work for vinit and yves) ...
>>>>>
>>>>> oe
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
>>>>>
>>>>> E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
>>>
>>>
>>> --
>>> Andrei
>>> PhD Candidate at Language Technology Group (LTG)
>>> University of Oslo
>>
>
>
--
Andrei
PhD Candidate at Language Technology Group (LTG)
University of Oslo
More information about the infrastructure
mailing list