[NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)
Martin Matthiesen
martin.matthiesen at csc.fi
Wed Sep 26 19:09:55 UTC 2018
Hi Stephan,
Yes, taito-gpu is the login node and has no GPU hardware. Taito and Taito-gpu also share most software. PyTorch should actually work on a Taito login node the same way as on Taito-gpu, ie in CPU mode. It crashes, so there must be a subtle difference that I have not yet figured out.
Martin
--
Martin Matthiesen
CSC - Tieteen tietotekniikan keskus
CSC - IT Center for Science
PL 405, 02101 Espoo, Finland
+358 9 457 2376, martin.matthiesen at csc.fi
Public key : https://pgp.mit.edu/pks/lookup?op=get&search=0x74B12876FD890704
Fingerprint: AA25 6F56 5C9A 8B42 009F BA70 74B1 2876 FD89 0704
> From: "Stephan Oepen" <oe at ifi.uio.no>
> To: "Yves Scherrer" <yves.scherrer at helsinki.fi>
> Cc: "Martin Matthiesen" <martin.matthiesen at csc.fi>, "infrastructure"
> <infrastructure at nlpl.eu>
> Sent: Wednesday, 26 September, 2018 21:57:03
> Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)
> yes, i see.
> i actually had a go at my own glibc and PyTorch installations on Taito, but so
> far gpu support is evasive. is my impression correct that ‘taito-gpu’ does not
> have gpu hardware itself, i.e. one still needs to srun an interactive shell
> onto an actual gpu node?
> oe
> On Wed, 26 Sep 2018 at 20:41 Scherrer, Yves < [ mailto:yves.scherrer at helsinki.fi
> | yves.scherrer at helsinki.fi ] > wrote:
>> Further validating your installation, I am currently training a model, and once
>> I found that I need to use $CUDA_VISIBLE_DEVICES it also seems to be training
>> on GPU :)
>> I’ll see if I can easily modify my test to use data from the NLPL repository
>> (the data is certainly not the problem, but there might be some preprocessing
>> steps for which scripts are not (yet) available).
>> Regarding virtualenv on CSC, it’s hit or miss:
>> - python-env/intelpython3.6-2018.3, which Martin mentioned lately and which
>> contains PyTorch, doesn’t have virtualenv
>> - python-env/3.5.3 has virtualenv, as you correctly observed
>> - python-env/3.4.0, which is the default version on taito-shell, doesn’t have
>> virtualenv
>> I’ll have to test if it’s easier to build on the intelpython or the “normal” gnu
>> one…
>> Yves
>>> On 26 Sep 2018, at 15:57, Stephan Oepen < [ mailto:oe at ifi.uio.no | oe at ifi.uio.no
>> > ] > wrote:
>> > many thanks for validating (to some degree at least :-) my OpenNMT-py
>> > installation on Abel. i have now added it to the software catalogue
>> > and created minimal documentation on the NLPL wiki:
>>> [ http://wiki.nlpl.eu/index.php/Infrastructure/software/catalogue |
>> > http://wiki.nlpl.eu/index.php/Infrastructure/software/catalogue ]
>>> [ http://wiki.nlpl.eu/index.php/Translation/opennmt-py |
>> > http://wiki.nlpl.eu/index.php/Translation/opennmt-py ]
>> > —could you suggest a minimal example workflow, demonstrating how to
>> > train and decode with OpenNMT, ideally using files from our own
>> > ‘/proj/nlpl/data/translation/’? speaking of which, should i start
>> > replicating that directory from Taito to Abel, i.e. remove what you
>> > had installed manually on Abel and instead turn on automated
>> > replication once a day?
>> > in principle, we should now produce a parallel installation of
>> > OpenNMT-py on Taito, of course—which presupposes that we get something
>> > parallel worked out for PyTorch.
>> > yves, why do you say that CSC does not include ‘virtualenv’ in their
>> > python installation? is there something principled that i am missing?
>> > [oe at taito-login3 ~]$ module add python-env/3.5.3
>> > Loading application python-3.5.3 environment with needed modules
>> > Switching compiler gcc to gcc/5.4.0
>> > Switching MPI version intelmpi to intelmpi/5.1.3
>> > The following have been reloaded with a version change:
>> > 1) gcc/4.8.2 => gcc/5.4.0 2) intelmpi/4.1.3 => intelmpi/5.1.3 3)
>> > mkl/11.3.0 => mkl/11.3.2 4) python-env/3.4.0 => python-env/3.5.3 5)
>> > python/3.4.0 => python/3.5.3
>> > [oe at taito-login3 ~]$ type -all python
>> > python is /appl/opt/python/3.5.3-gnu540/bin/python
>> > [oe at taito-login3 ~]$ type -all virtualenv
>> > virtualenv is /appl/opt/python/3.5.3-gnu540/bin/virtualenv
>> > so, i am guessing we could presumably attempt an NLPL-maintained
>> > installation of PyTorch into a 3.5 virtual environment, which would
>> > likely require a custom glibc installation too (and the same kind of
>> > dynamic linking ‘gymnastics’).
>> > i feel i still need to learn more about the CSC environment. are the
>> > modules available on taito-gpu the same as on the cpu nodes? in other
>> > words, do both types of nodes see the same file system?
>> > cheers, oe
>> > On Wed, Sep 26, 2018 at 9:59 AM, Scherrer, Yves
>> > < [ mailto:yves.scherrer at helsinki.fi | yves.scherrer at helsinki.fi ] > wrote:
>> >> Hi,
>> >> I’ve had a quick look at Stephan’s OpenNMT-py on Abel. The onmt module seems
>> >> to work, but one generally uses the scripts “preprocess.py”, “train.py” and
>> >> “translate.py” (at the root directory of the Github repo), and these scripts
>> >> seem to be missing from the module. Would it be possible to copy these three
>> >> scripts (there is a fourth one, “server.py”, but this one might not be
>> >> relevant for common usage) somewhere inside the virtual environment, so that
>> >> they can be found and called easily?
>> >> I have to say that I find these stacked virtual environments quite elegant.
>> >> Too bad that CSC doesn’t even include the virtualenv command in their
>> >> python-env modules…
>> >> Best,
>> >> Yves
>> >> ________________________________
>> >> From: Stephan Oepen < [ mailto:oe at ifi.uio.no | oe at ifi.uio.no ] >
>> >> Sent: Thursday, September 20, 2018 12:31:58 AM
>> >> To: Scherrer, Yves
>> >> Cc: Martin Matthiesen; infrastructure
>> >> Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)
>> >> dear all,
>> >> yes, chaining virtual environments appears to work as one would
>> >> expect. i might in fact have managed to install OpenNMT-py on Abel,
>> >> using my new PyTorch 0.4.1 virtual environment, essentially:
>> >> module load nlpl-pytorch
>> >> /projects/nlpl/software/opennmt-py/
>> >> virtualenv /projects/nlpl/software/opennmt-py/0.2.1
>> >> at this point, i had to manually change the ‘python’, ‘python3’, and
>> >> ‘python3.5’ files in the new ‘bin/’ directory, to avail themselves of
>> >> the custom glibc; see
>>>> ‘ [ http://wiki.nlpl.eu/index.php/Infrastructure/software/glibc |
>> >> http://wiki.nlpl.eu/index.php/Infrastructure/software/glibc ] ’.
>> >> cd /projects/nlpl/software/modulefiles
>> >> mkdir nlpl-opennmt-py
>> >> cp nlpl-pytorch/0.4.1 nlpl-opennmt-py/0.2.1
>> >> vi nlpl-opennmt-py/0.2.1
>> >> cd ~/src/nlpl
>> >> module purge
>> >> module load nlpl-opennmt-py
>>>> wget [ https://github.com/OpenNMT/OpenNMT-py/archive/0.2.1.tar.gz |
>> >> https://github.com/OpenNMT/OpenNMT-py/archive/0.2.1.tar.gz ]
>> >> tar zpSxvf 0.2.1.tar.gz
>> >> cd OpenNMT-py-0.2.1
>> >> python setup.py install
>> >> so far, my testing is limited to
>> >> python -c "import torch; import onmt; print(onmt.__version__);"
>> >> yves, would you maybe have a chance next week to see whether this
>> >> installation appears healthy to you?
>> >> cheers, oe
>> >> On Wed, Sep 19, 2018 at 1:12 PM, Scherrer, Yves
>> >> < [ mailto:yves.scherrer at helsinki.fi | yves.scherrer at helsinki.fi ] > wrote:
>> >>> Hi Stephan, Martin,
>> >>> I’m catching up on this thread… A few questions from my side:
>> >>> Regarding Martin’s latest suggestion: that seems indeed to work fine,
>> >>> although with the exact same commands I get a different version of
>> >>> PyTorch:
>> >>>>>> import torch
>> >>>>>> torch.__file__
>> >>> '/appl/opt/python/intelpython36-2018.3/intelpython3/lib/python3.6/site-packages/torch/__init__.py'
>> >>>>>> torch.__version__
>> >>> '0.4.0a0+3749c58'
>> >>> In any case, if PyTorch is already installed in some Python distribution,
>> >>> that would make setting up a specific OpenNMT module rather easy. If not,
>> >>> virtual environments should work as well (the tricky thing is mainly to
>> >>> figure out which python versions play well with CUDA…)
>> >>> Regarding Stephan’s suggestion of virtual environments: do you know if
>> >>> virtual environments can be “stacked”, i.e. whether I could create an
>> >>> OpenNMT virtual environment that lies on top of your PyTorch environment?
>> >>> Or
>> >>> would I have to re-install another instance of PyTorch in the OpenNMT
>> >>> virtualenv?
>> >>> I’ll be travelling for the rest of the week, but will try to have a closer
>> >>> look at these options next week.
>> >>> Best,
>> >>> Yves
>> >>> ________________________________
>>>>> From: Martin Matthiesen < [ mailto:martin.matthiesen at csc.fi |
>> >>> martin.matthiesen at csc.fi ] >
>> >>> Sent: Wednesday, September 19, 2018 1:29:35 PM
>> >>> To: Stephan Oepen
>> >>> Cc: infrastructure; Scherrer, Yves
>> >>> Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)
>> >>> Hello Stephan,
>> >>> ----- Original Message -----
>> >>>> From: "Stephan Oepen" < [ mailto:oe at ifi.uio.no | oe at ifi.uio.no ] >
>>>>>> To: "Martin Matthiesen" < [ mailto:martin.matthiesen at csc.fi |
>> >>>> martin.matthiesen at csc.fi ] >
>>>>>> Cc: "infrastructure" < [ mailto:infrastructure at nlpl.eu | infrastructure at nlpl.eu
>> >>>> ] >, "Yves Scherrer"
>> >>>> < [ mailto:yves.scherrer at helsinki.fi | yves.scherrer at helsinki.fi ] >
>> >>>> Sent: Tuesday, 18 September, 2018 14:13:53
>> >>>> Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on
>> >>>> Abel)
>> >>>> sorry, i was the one who had introduced the confusion about mailing
>>>>>> lists. there is no ‘ [ mailto:translation at nlpl.eu | translation at nlpl.eu ] ’
>> >>>> currently, and upon
>> >>>> consultation with joerg there appears not to be a great need for it
>> >>>> either (once i get around to documenting the task force structure on
>> >>>> the project wiki, i might want to create that list nevertheless).
>> >>>> i am adding yves to thread now, so he at least has a chance of knowing
>> >>>> what we are talking about :-).
>> >>> Ok!
>> >>>> martin, i doubt that an installation of OpenNMT that requires everyone
>> >>>> to ‘pip install --user’ into their home directory will be a good
>> >>>> solution. that way, the getting started instructions will be more
>> >>>> complex, and we lack control over which version of PyTorch gets
>> >>>> installed at the time the user actually runs the command. my
>> >>>> immediate reaction at least is that NLPL-supported software should be
>> >>>> ‘self-contained’, in the sense of not depending on software components
>> >>>> maintained by the user.
>> >>> Ok, I understand.
>> >>>> what i am doing increasingly on abel is deriving virtual environments;
>> >>>> e.g. my PyTorch installation (for NLPL) straightforwardly builds on
>> >>>> the USIT-maintained python 3.5. i suppose we should be able to do the
>> >>>> same thing on taito, i.e. create ‘nlpl-pytorch’ as a virtual
>> >>>> environment that includes the precompiled PyTorch wheel from your CSC
>> >>>> colleagues?
>> >>> Yes, I guess that is the only sensible solution to not lose track
>> >>> completely. In the meantime, how would this work for you all:
>> >>> [GPU-Env ~]$ module load python-env/intelpython3.6-2018.3
>> >>> Loading application Intel Distribution for Python 2018 update 3
>> >>> [GPU-Env ~]$ module list
>> >>> Currently Loaded Modules:
>> >>> 1) gcc/4.9.3 2) cuda/7.5 3) StdEnv 4) git/2.17.1 5)
>> >>> python-env/intelpython3.6-2018.3
>> >>> [GPU-Env ~]$ python3
>> >>> Python 3.6.3 |Intel Corporation| (default, May 4 2018, 04:22:28)
>> >>> [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
>> >>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> Intel(R) Distribution for Python is brought to you by Intel Corporation.
>>>>> Please check out: [ https://software.intel.com/en-us/python-distribution |
>> >>> https://software.intel.com/en-us/python-distribution ]
>> >>>>>> import torch
>> >>>>>> torch.__version__
>> >>> '0.4.1'
>> >>> Kudos to my colleagues Markus and Jarmo here.
>> >>> Martin
>> >>>> oe
>> >>>> On Mon, Sep 17, 2018 at 5:06 PM, Martin Matthiesen
>> >>>> < [ mailto:martin.matthiesen at csc.fi | martin.matthiesen at csc.fi ] > wrote:
>> >>>>> Hello,
>> >>>>> We already have a way to use pytorch 0.4.1 on Taito-GPU:
>> >>>>> module load python-env/intelpython3.6-2018.3
>> >>>>> [GPU-Env ~]$ pip install -v --user
>> >>>>> /appl/opt/pytorch/0.4.1/cu90/torch-0.4.1-cp36-cp36m-linux_x86_64.whl
>> >>>>> One of my colleagues has compiled the module. Note that the module needs
>> >>>>> python
>> >>>>> 3.6 to work, the highest available on Taito-GPU.
>> >>>>> Before I investigate CPU-support or support for other compilers, would
>> >>>>> this
>> >>>>> pip-approach work for you?
>> >>>>> Regards,
>> >>>>> Martin
>> >>>>> ----- Original Message -----
>> >>>>>> From: "Stephan Oepen" < [ mailto:oe at ifi.uio.no | oe at ifi.uio.no ] >
>> >>>>>> To: [ mailto:translation at nlpl.eu | translation at nlpl.eu ]
>>>>>>>> Cc: "infrastructure" < [ mailto:infrastructure at nlpl.eu | infrastructure at nlpl.eu
>> >>>>>> ] >
>> >>>>>> Sent: Saturday, 15 September, 2018 18:59:29
>> >>>>>> Subject: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)
>> >>>>>> colleagues,
>> >>>>>> joerg, martin, and i talked about getting the new release version of
>> >>>>>> OpenNMT installed for NLPL. it appears it requires the most recent
>> >>>>>> version of PyTorch, which currently is not available on Taito. martin
>> >>>>>> will ask for it to be installed by CSC.
>> >>>>>> in parallel, i believe i managed to put an NLPL-owned installation of
>> >>>>>> the right PyTorch version onto Abel, please see:
>>>>>>>> [ http://wiki.nlpl.eu/index.php/Infrastructure/software/pytorch |
>> >>>>>> http://wiki.nlpl.eu/index.php/Infrastructure/software/pytorch ]
>> >>>>>> before announcing this more widely, i would be grateful for some
>> >>>>>> testing, in particular for both cpu and gpu usage. would anyone we
>> >>>>>> readily set up to give this a shot on Abel?
>> >>>>>> assuming our PyTorch is healthy, would someone from the helsinki team
>> >>>>>> have the time to try and install OpenNMT onto Abel, e.g. as
>> >>>>>> /projects/nlpl/software/opennmt-py/0.2.1
>> >>>>>> there have been two relatively recent requests for OpenNMT in oslo
>> >>>>>> (one of them for seq2seq dependency parsing :-), so i believe it would
>> >>>>>> now be warranted to provide it on both systems.
>> >>>>>> best wishes, oe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlpl.eu/archives/infrastructure/attachments/20180926/c73e3318/attachment.htm>
More information about the infrastructure
mailing list