[NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)

Martin Matthiesen martin.matthiesen at csc.fi
Wed Sep 26 19:09:55 UTC 2018


Hi Stephan, 

Yes, taito-gpu is the login node and has no GPU hardware. Taito and Taito-gpu also share most software. PyTorch should actually work on a Taito login node the same way as on Taito-gpu, ie in CPU mode. It crashes, so there must be a subtle difference that I have not yet figured out. 

Martin 

-- 
Martin Matthiesen 
CSC - Tieteen tietotekniikan keskus 
CSC - IT Center for Science 
PL 405, 02101 Espoo, Finland 
+358 9 457 2376, martin.matthiesen at csc.fi 
Public key : https://pgp.mit.edu/pks/lookup?op=get&search=0x74B12876FD890704 
Fingerprint: AA25 6F56 5C9A 8B42 009F BA70 74B1 2876 FD89 0704 

> From: "Stephan Oepen" <oe at ifi.uio.no>
> To: "Yves Scherrer" <yves.scherrer at helsinki.fi>
> Cc: "Martin Matthiesen" <martin.matthiesen at csc.fi>, "infrastructure"
> <infrastructure at nlpl.eu>
> Sent: Wednesday, 26 September, 2018 21:57:03
> Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)

> yes, i see.

> i actually had a go at my own glibc and PyTorch installations on Taito, but so
> far gpu support is evasive. is my impression correct that ‘taito-gpu’ does not
> have gpu hardware itself, i.e. one still needs to srun an interactive shell
> onto an actual gpu node?

> oe

> On Wed, 26 Sep 2018 at 20:41 Scherrer, Yves < [ mailto:yves.scherrer at helsinki.fi
> | yves.scherrer at helsinki.fi ] > wrote:

>> Further validating your installation, I am currently training a model, and once
>> I found that I need to use $CUDA_VISIBLE_DEVICES it also seems to be training
>> on GPU :)

>> I’ll see if I can easily modify my test to use data from the NLPL repository
>> (the data is certainly not the problem, but there might be some preprocessing
>> steps for which scripts are not (yet) available).

>> Regarding virtualenv on CSC, it’s hit or miss:
>> - python-env/intelpython3.6-2018.3, which Martin mentioned lately and which
>> contains PyTorch, doesn’t have virtualenv
>> - python-env/3.5.3 has virtualenv, as you correctly observed
>> - python-env/3.4.0, which is the default version on taito-shell, doesn’t have
>> virtualenv

>> I’ll have to test if it’s easier to build on the intelpython or the “normal” gnu
>> one…

>> Yves

>>> On 26 Sep 2018, at 15:57, Stephan Oepen < [ mailto:oe at ifi.uio.no | oe at ifi.uio.no
>> > ] > wrote:

>> > many thanks for validating (to some degree at least :-) my OpenNMT-py
>> > installation on Abel. i have now added it to the software catalogue
>> > and created minimal documentation on the NLPL wiki:

>>> [ http://wiki.nlpl.eu/index.php/Infrastructure/software/catalogue |
>> > http://wiki.nlpl.eu/index.php/Infrastructure/software/catalogue ]
>>> [ http://wiki.nlpl.eu/index.php/Translation/opennmt-py |
>> > http://wiki.nlpl.eu/index.php/Translation/opennmt-py ]

>> > —could you suggest a minimal example workflow, demonstrating how to
>> > train and decode with OpenNMT, ideally using files from our own
>> > ‘/proj/nlpl/data/translation/’? speaking of which, should i start
>> > replicating that directory from Taito to Abel, i.e. remove what you
>> > had installed manually on Abel and instead turn on automated
>> > replication once a day?

>> > in principle, we should now produce a parallel installation of
>> > OpenNMT-py on Taito, of course—which presupposes that we get something
>> > parallel worked out for PyTorch.

>> > yves, why do you say that CSC does not include ‘virtualenv’ in their
>> > python installation? is there something principled that i am missing?

>> > [oe at taito-login3 ~]$ module add python-env/3.5.3
>> > Loading application python-3.5.3 environment with needed modules
>> > Switching compiler gcc to gcc/5.4.0
>> > Switching MPI version intelmpi to intelmpi/5.1.3

>> > The following have been reloaded with a version change:
>> > 1) gcc/4.8.2 => gcc/5.4.0 2) intelmpi/4.1.3 => intelmpi/5.1.3 3)
>> > mkl/11.3.0 => mkl/11.3.2 4) python-env/3.4.0 => python-env/3.5.3 5)
>> > python/3.4.0 => python/3.5.3

>> > [oe at taito-login3 ~]$ type -all python
>> > python is /appl/opt/python/3.5.3-gnu540/bin/python
>> > [oe at taito-login3 ~]$ type -all virtualenv
>> > virtualenv is /appl/opt/python/3.5.3-gnu540/bin/virtualenv

>> > so, i am guessing we could presumably attempt an NLPL-maintained
>> > installation of PyTorch into a 3.5 virtual environment, which would
>> > likely require a custom glibc installation too (and the same kind of
>> > dynamic linking ‘gymnastics’).

>> > i feel i still need to learn more about the CSC environment. are the
>> > modules available on taito-gpu the same as on the cpu nodes? in other
>> > words, do both types of nodes see the same file system?

>> > cheers, oe


>> > On Wed, Sep 26, 2018 at 9:59 AM, Scherrer, Yves
>> > < [ mailto:yves.scherrer at helsinki.fi | yves.scherrer at helsinki.fi ] > wrote:
>> >> Hi,



>> >> I’ve had a quick look at Stephan’s OpenNMT-py on Abel. The onmt module seems
>> >> to work, but one generally uses the scripts “preprocess.py”, “train.py” and
>> >> “translate.py” (at the root directory of the Github repo), and these scripts
>> >> seem to be missing from the module. Would it be possible to copy these three
>> >> scripts (there is a fourth one, “server.py”, but this one might not be
>> >> relevant for common usage) somewhere inside the virtual environment, so that
>> >> they can be found and called easily?



>> >> I have to say that I find these stacked virtual environments quite elegant.
>> >> Too bad that CSC doesn’t even include the virtualenv command in their
>> >> python-env modules…



>> >> Best,

>> >> Yves



>> >> ________________________________
>> >> From: Stephan Oepen < [ mailto:oe at ifi.uio.no | oe at ifi.uio.no ] >
>> >> Sent: Thursday, September 20, 2018 12:31:58 AM
>> >> To: Scherrer, Yves
>> >> Cc: Martin Matthiesen; infrastructure

>> >> Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)

>> >> dear all,

>> >> yes, chaining virtual environments appears to work as one would
>> >> expect. i might in fact have managed to install OpenNMT-py on Abel,
>> >> using my new PyTorch 0.4.1 virtual environment, essentially:

>> >> module load nlpl-pytorch
>> >> /projects/nlpl/software/opennmt-py/
>> >> virtualenv /projects/nlpl/software/opennmt-py/0.2.1

>> >> at this point, i had to manually change the ‘python’, ‘python3’, and
>> >> ‘python3.5’ files in the new ‘bin/’ directory, to avail themselves of
>> >> the custom glibc; see
>>>> ‘ [ http://wiki.nlpl.eu/index.php/Infrastructure/software/glibc |
>> >> http://wiki.nlpl.eu/index.php/Infrastructure/software/glibc ] ’.

>> >> cd /projects/nlpl/software/modulefiles
>> >> mkdir nlpl-opennmt-py
>> >> cp nlpl-pytorch/0.4.1 nlpl-opennmt-py/0.2.1
>> >> vi nlpl-opennmt-py/0.2.1

>> >> cd ~/src/nlpl
>> >> module purge
>> >> module load nlpl-opennmt-py
>>>> wget [ https://github.com/OpenNMT/OpenNMT-py/archive/0.2.1.tar.gz |
>> >> https://github.com/OpenNMT/OpenNMT-py/archive/0.2.1.tar.gz ]
>> >> tar zpSxvf 0.2.1.tar.gz
>> >> cd OpenNMT-py-0.2.1
>> >> python setup.py install

>> >> so far, my testing is limited to

>> >> python -c "import torch; import onmt; print(onmt.__version__);"

>> >> yves, would you maybe have a chance next week to see whether this
>> >> installation appears healthy to you?

>> >> cheers, oe


>> >> On Wed, Sep 19, 2018 at 1:12 PM, Scherrer, Yves
>> >> < [ mailto:yves.scherrer at helsinki.fi | yves.scherrer at helsinki.fi ] > wrote:
>> >>> Hi Stephan, Martin,



>> >>> I’m catching up on this thread… A few questions from my side:



>> >>> Regarding Martin’s latest suggestion: that seems indeed to work fine,
>> >>> although with the exact same commands I get a different version of
>> >>> PyTorch:

>> >>>>>> import torch

>> >>>>>> torch.__file__


>> >>> '/appl/opt/python/intelpython36-2018.3/intelpython3/lib/python3.6/site-packages/torch/__init__.py'

>> >>>>>> torch.__version__

>> >>> '0.4.0a0+3749c58'



>> >>> In any case, if PyTorch is already installed in some Python distribution,
>> >>> that would make setting up a specific OpenNMT module rather easy. If not,
>> >>> virtual environments should work as well (the tricky thing is mainly to
>> >>> figure out which python versions play well with CUDA…)



>> >>> Regarding Stephan’s suggestion of virtual environments: do you know if
>> >>> virtual environments can be “stacked”, i.e. whether I could create an
>> >>> OpenNMT virtual environment that lies on top of your PyTorch environment?
>> >>> Or
>> >>> would I have to re-install another instance of PyTorch in the OpenNMT
>> >>> virtualenv?



>> >>> I’ll be travelling for the rest of the week, but will try to have a closer
>> >>> look at these options next week.



>> >>> Best,

>> >>> Yves



>> >>> ________________________________
>>>>> From: Martin Matthiesen < [ mailto:martin.matthiesen at csc.fi |
>> >>> martin.matthiesen at csc.fi ] >
>> >>> Sent: Wednesday, September 19, 2018 1:29:35 PM
>> >>> To: Stephan Oepen
>> >>> Cc: infrastructure; Scherrer, Yves

>> >>> Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)

>> >>> Hello Stephan,

>> >>> ----- Original Message -----
>> >>>> From: "Stephan Oepen" < [ mailto:oe at ifi.uio.no | oe at ifi.uio.no ] >
>>>>>> To: "Martin Matthiesen" < [ mailto:martin.matthiesen at csc.fi |
>> >>>> martin.matthiesen at csc.fi ] >
>>>>>> Cc: "infrastructure" < [ mailto:infrastructure at nlpl.eu | infrastructure at nlpl.eu
>> >>>> ] >, "Yves Scherrer"
>> >>>> < [ mailto:yves.scherrer at helsinki.fi | yves.scherrer at helsinki.fi ] >
>> >>>> Sent: Tuesday, 18 September, 2018 14:13:53
>> >>>> Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on
>> >>>> Abel)

>> >>>> sorry, i was the one who had introduced the confusion about mailing
>>>>>> lists. there is no ‘ [ mailto:translation at nlpl.eu | translation at nlpl.eu ] ’
>> >>>> currently, and upon
>> >>>> consultation with joerg there appears not to be a great need for it
>> >>>> either (once i get around to documenting the task force structure on
>> >>>> the project wiki, i might want to create that list nevertheless).

>> >>>> i am adding yves to thread now, so he at least has a chance of knowing
>> >>>> what we are talking about :-).

>> >>> Ok!

>> >>>> martin, i doubt that an installation of OpenNMT that requires everyone
>> >>>> to ‘pip install --user’ into their home directory will be a good
>> >>>> solution. that way, the getting started instructions will be more
>> >>>> complex, and we lack control over which version of PyTorch gets
>> >>>> installed at the time the user actually runs the command. my
>> >>>> immediate reaction at least is that NLPL-supported software should be
>> >>>> ‘self-contained’, in the sense of not depending on software components
>> >>>> maintained by the user.

>> >>> Ok, I understand.

>> >>>> what i am doing increasingly on abel is deriving virtual environments;
>> >>>> e.g. my PyTorch installation (for NLPL) straightforwardly builds on
>> >>>> the USIT-maintained python 3.5. i suppose we should be able to do the
>> >>>> same thing on taito, i.e. create ‘nlpl-pytorch’ as a virtual
>> >>>> environment that includes the precompiled PyTorch wheel from your CSC
>> >>>> colleagues?

>> >>> Yes, I guess that is the only sensible solution to not lose track
>> >>> completely. In the meantime, how would this work for you all:

>> >>> [GPU-Env ~]$ module load python-env/intelpython3.6-2018.3
>> >>> Loading application Intel Distribution for Python 2018 update 3
>> >>> [GPU-Env ~]$ module list

>> >>> Currently Loaded Modules:
>> >>> 1) gcc/4.9.3 2) cuda/7.5 3) StdEnv 4) git/2.17.1 5)
>> >>> python-env/intelpython3.6-2018.3

>> >>> [GPU-Env ~]$ python3
>> >>> Python 3.6.3 |Intel Corporation| (default, May 4 2018, 04:22:28)
>> >>> [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
>> >>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> Intel(R) Distribution for Python is brought to you by Intel Corporation.
>>>>> Please check out: [ https://software.intel.com/en-us/python-distribution |
>> >>> https://software.intel.com/en-us/python-distribution ]
>> >>>>>> import torch
>> >>>>>> torch.__version__
>> >>> '0.4.1'

>> >>> Kudos to my colleagues Markus and Jarmo here.

>> >>> Martin


>> >>>> oe




>> >>>> On Mon, Sep 17, 2018 at 5:06 PM, Martin Matthiesen
>> >>>> < [ mailto:martin.matthiesen at csc.fi | martin.matthiesen at csc.fi ] > wrote:
>> >>>>> Hello,

>> >>>>> We already have a way to use pytorch 0.4.1 on Taito-GPU:

>> >>>>> module load python-env/intelpython3.6-2018.3
>> >>>>> [GPU-Env ~]$ pip install -v --user
>> >>>>> /appl/opt/pytorch/0.4.1/cu90/torch-0.4.1-cp36-cp36m-linux_x86_64.whl

>> >>>>> One of my colleagues has compiled the module. Note that the module needs
>> >>>>> python
>> >>>>> 3.6 to work, the highest available on Taito-GPU.

>> >>>>> Before I investigate CPU-support or support for other compilers, would
>> >>>>> this
>> >>>>> pip-approach work for you?

>> >>>>> Regards,
>> >>>>> Martin

>> >>>>> ----- Original Message -----
>> >>>>>> From: "Stephan Oepen" < [ mailto:oe at ifi.uio.no | oe at ifi.uio.no ] >
>> >>>>>> To: [ mailto:translation at nlpl.eu | translation at nlpl.eu ]
>>>>>>>> Cc: "infrastructure" < [ mailto:infrastructure at nlpl.eu | infrastructure at nlpl.eu
>> >>>>>> ] >
>> >>>>>> Sent: Saturday, 15 September, 2018 18:59:29
>> >>>>>> Subject: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)

>> >>>>>> colleagues,

>> >>>>>> joerg, martin, and i talked about getting the new release version of
>> >>>>>> OpenNMT installed for NLPL. it appears it requires the most recent
>> >>>>>> version of PyTorch, which currently is not available on Taito. martin
>> >>>>>> will ask for it to be installed by CSC.

>> >>>>>> in parallel, i believe i managed to put an NLPL-owned installation of
>> >>>>>> the right PyTorch version onto Abel, please see:

>>>>>>>> [ http://wiki.nlpl.eu/index.php/Infrastructure/software/pytorch |
>> >>>>>> http://wiki.nlpl.eu/index.php/Infrastructure/software/pytorch ]

>> >>>>>> before announcing this more widely, i would be grateful for some
>> >>>>>> testing, in particular for both cpu and gpu usage. would anyone we
>> >>>>>> readily set up to give this a shot on Abel?

>> >>>>>> assuming our PyTorch is healthy, would someone from the helsinki team
>> >>>>>> have the time to try and install OpenNMT onto Abel, e.g. as

>> >>>>>> /projects/nlpl/software/opennmt-py/0.2.1

>> >>>>>> there have been two relatively recent requests for OpenNMT in oslo
>> >>>>>> (one of them for seq2seq dependency parsing :-), so i believe it would
>> >>>>>> now be warranted to provide it on both systems.

>> >>>>>> best wishes, oe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlpl.eu/archives/infrastructure/attachments/20180926/c73e3318/attachment.htm>


More information about the infrastructure mailing list