[NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)
Stephan Oepen
oe at ifi.uio.no
Wed Sep 26 18:57:03 UTC 2018
yes, i see.
i actually had a go at my own glibc and PyTorch installations on Taito, but
so far gpu support is evasive. is my impression correct that ‘taito-gpu’
does not have gpu hardware itself, i.e. one still needs to srun an
interactive shell onto an actual gpu node?
oe
On Wed, 26 Sep 2018 at 20:41 Scherrer, Yves <yves.scherrer at helsinki.fi>
wrote:
> Further validating your installation, I am currently training a model, and
> once I found that I need to use $CUDA_VISIBLE_DEVICES it also seems to be
> training on GPU :)
>
> I’ll see if I can easily modify my test to use data from the NLPL
> repository (the data is certainly not the problem, but there might be some
> preprocessing steps for which scripts are not (yet) available).
>
> Regarding virtualenv on CSC, it’s hit or miss:
> - python-env/intelpython3.6-2018.3, which Martin mentioned lately and
> which contains PyTorch, doesn’t have virtualenv
> - python-env/3.5.3 has virtualenv, as you correctly observed
> - python-env/3.4.0, which is the default version on taito-shell, doesn’t
> have virtualenv
>
> I’ll have to test if it’s easier to build on the intelpython or the
> “normal” gnu one…
>
> Yves
>
> > On 26 Sep 2018, at 15:57, Stephan Oepen <oe at ifi.uio.no> wrote:
> >
> > many thanks for validating (to some degree at least :-) my OpenNMT-py
> > installation on Abel. i have now added it to the software catalogue
> > and created minimal documentation on the NLPL wiki:
> >
> > http://wiki.nlpl.eu/index.php/Infrastructure/software/catalogue
> > http://wiki.nlpl.eu/index.php/Translation/opennmt-py
> >
> > —could you suggest a minimal example workflow, demonstrating how to
> > train and decode with OpenNMT, ideally using files from our own
> > ‘/proj/nlpl/data/translation/’? speaking of which, should i start
> > replicating that directory from Taito to Abel, i.e. remove what you
> > had installed manually on Abel and instead turn on automated
> > replication once a day?
> >
> > in principle, we should now produce a parallel installation of
> > OpenNMT-py on Taito, of course—which presupposes that we get something
> > parallel worked out for PyTorch.
> >
> > yves, why do you say that CSC does not include ‘virtualenv’ in their
> > python installation? is there something principled that i am missing?
> >
> > [oe at taito-login3 ~]$ module add python-env/3.5.3
> > Loading application python-3.5.3 environment with needed modules
> > Switching compiler gcc to gcc/5.4.0
> > Switching MPI version intelmpi to intelmpi/5.1.3
> >
> > The following have been reloaded with a version change:
> > 1) gcc/4.8.2 => gcc/5.4.0 2) intelmpi/4.1.3 => intelmpi/5.1.3 3)
> > mkl/11.3.0 => mkl/11.3.2 4) python-env/3.4.0 => python-env/3.5.3 5)
> > python/3.4.0 => python/3.5.3
> >
> > [oe at taito-login3 ~]$ type -all python
> > python is /appl/opt/python/3.5.3-gnu540/bin/python
> > [oe at taito-login3 ~]$ type -all virtualenv
> > virtualenv is /appl/opt/python/3.5.3-gnu540/bin/virtualenv
> >
> > so, i am guessing we could presumably attempt an NLPL-maintained
> > installation of PyTorch into a 3.5 virtual environment, which would
> > likely require a custom glibc installation too (and the same kind of
> > dynamic linking ‘gymnastics’).
> >
> > i feel i still need to learn more about the CSC environment. are the
> > modules available on taito-gpu the same as on the cpu nodes? in other
> > words, do both types of nodes see the same file system?
> >
> > cheers, oe
> >
> >
> > On Wed, Sep 26, 2018 at 9:59 AM, Scherrer, Yves
> > <yves.scherrer at helsinki.fi> wrote:
> >> Hi,
> >>
> >>
> >>
> >> I’ve had a quick look at Stephan’s OpenNMT-py on Abel. The onmt module
> seems
> >> to work, but one generally uses the scripts “preprocess.py”, “train.py”
> and
> >> “translate.py” (at the root directory of the Github repo), and these
> scripts
> >> seem to be missing from the module. Would it be possible to copy these
> three
> >> scripts (there is a fourth one, “server.py”, but this one might not be
> >> relevant for common usage) somewhere inside the virtual environment, so
> that
> >> they can be found and called easily?
> >>
> >>
> >>
> >> I have to say that I find these stacked virtual environments quite
> elegant.
> >> Too bad that CSC doesn’t even include the virtualenv command in their
> >> python-env modules…
> >>
> >>
> >>
> >> Best,
> >>
> >> Yves
> >>
> >>
> >>
> >> ________________________________
> >> From: Stephan Oepen <oe at ifi.uio.no>
> >> Sent: Thursday, September 20, 2018 12:31:58 AM
> >> To: Scherrer, Yves
> >> Cc: Martin Matthiesen; infrastructure
> >>
> >> Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on
> Abel)
> >>
> >> dear all,
> >>
> >> yes, chaining virtual environments appears to work as one would
> >> expect. i might in fact have managed to install OpenNMT-py on Abel,
> >> using my new PyTorch 0.4.1 virtual environment, essentially:
> >>
> >> module load nlpl-pytorch
> >> /projects/nlpl/software/opennmt-py/
> >> virtualenv /projects/nlpl/software/opennmt-py/0.2.1
> >>
> >> at this point, i had to manually change the ‘python’, ‘python3’, and
> >> ‘python3.5’ files in the new ‘bin/’ directory, to avail themselves of
> >> the custom glibc; see
> >> ‘http://wiki.nlpl.eu/index.php/Infrastructure/software/glibc’.
> >>
> >> cd /projects/nlpl/software/modulefiles
> >> mkdir nlpl-opennmt-py
> >> cp nlpl-pytorch/0.4.1 nlpl-opennmt-py/0.2.1
> >> vi nlpl-opennmt-py/0.2.1
> >>
> >> cd ~/src/nlpl
> >> module purge
> >> module load nlpl-opennmt-py
> >> wget https://github.com/OpenNMT/OpenNMT-py/archive/0.2.1.tar.gz
> >> tar zpSxvf 0.2.1.tar.gz
> >> cd OpenNMT-py-0.2.1
> >> python setup.py install
> >>
> >> so far, my testing is limited to
> >>
> >> python -c "import torch; import onmt; print(onmt.__version__);"
> >>
> >> yves, would you maybe have a chance next week to see whether this
> >> installation appears healthy to you?
> >>
> >> cheers, oe
> >>
> >>
> >> On Wed, Sep 19, 2018 at 1:12 PM, Scherrer, Yves
> >> <yves.scherrer at helsinki.fi> wrote:
> >>> Hi Stephan, Martin,
> >>>
> >>>
> >>>
> >>> I’m catching up on this thread… A few questions from my side:
> >>>
> >>>
> >>>
> >>> Regarding Martin’s latest suggestion: that seems indeed to work fine,
> >>> although with the exact same commands I get a different version of
> >>> PyTorch:
> >>>
> >>>>>> import torch
> >>>
> >>>>>> torch.__file__
> >>>
> >>>
> >>>
> '/appl/opt/python/intelpython36-2018.3/intelpython3/lib/python3.6/site-packages/torch/__init__.py'
> >>>
> >>>>>> torch.__version__
> >>>
> >>> '0.4.0a0+3749c58'
> >>>
> >>>
> >>>
> >>> In any case, if PyTorch is already installed in some Python
> distribution,
> >>> that would make setting up a specific OpenNMT module rather easy. If
> not,
> >>> virtual environments should work as well (the tricky thing is mainly to
> >>> figure out which python versions play well with CUDA…)
> >>>
> >>>
> >>>
> >>> Regarding Stephan’s suggestion of virtual environments: do you know if
> >>> virtual environments can be “stacked”, i.e. whether I could create an
> >>> OpenNMT virtual environment that lies on top of your PyTorch
> environment?
> >>> Or
> >>> would I have to re-install another instance of PyTorch in the OpenNMT
> >>> virtualenv?
> >>>
> >>>
> >>>
> >>> I’ll be travelling for the rest of the week, but will try to have a
> closer
> >>> look at these options next week.
> >>>
> >>>
> >>>
> >>> Best,
> >>>
> >>> Yves
> >>>
> >>>
> >>>
> >>> ________________________________
> >>> From: Martin Matthiesen <martin.matthiesen at csc.fi>
> >>> Sent: Wednesday, September 19, 2018 1:29:35 PM
> >>> To: Stephan Oepen
> >>> Cc: infrastructure; Scherrer, Yves
> >>>
> >>> Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on
> Abel)
> >>>
> >>> Hello Stephan,
> >>>
> >>> ----- Original Message -----
> >>>> From: "Stephan Oepen" <oe at ifi.uio.no>
> >>>> To: "Martin Matthiesen" <martin.matthiesen at csc.fi>
> >>>> Cc: "infrastructure" <infrastructure at nlpl.eu>, "Yves Scherrer"
> >>>> <yves.scherrer at helsinki.fi>
> >>>> Sent: Tuesday, 18 September, 2018 14:13:53
> >>>> Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on
> >>>> Abel)
> >>>
> >>>> sorry, i was the one who had introduced the confusion about mailing
> >>>> lists. there is no ‘translation at nlpl.eu’ currently, and upon
> >>>> consultation with joerg there appears not to be a great need for it
> >>>> either (once i get around to documenting the task force structure on
> >>>> the project wiki, i might want to create that list nevertheless).
> >>>>
> >>>> i am adding yves to thread now, so he at least has a chance of knowing
> >>>> what we are talking about :-).
> >>>
> >>> Ok!
> >>>>
> >>>> martin, i doubt that an installation of OpenNMT that requires everyone
> >>>> to ‘pip install --user’ into their home directory will be a good
> >>>> solution. that way, the getting started instructions will be more
> >>>> complex, and we lack control over which version of PyTorch gets
> >>>> installed at the time the user actually runs the command. my
> >>>> immediate reaction at least is that NLPL-supported software should be
> >>>> ‘self-contained’, in the sense of not depending on software components
> >>>> maintained by the user.
> >>>
> >>> Ok, I understand.
> >>>>
> >>>> what i am doing increasingly on abel is deriving virtual environments;
> >>>> e.g. my PyTorch installation (for NLPL) straightforwardly builds on
> >>>> the USIT-maintained python 3.5. i suppose we should be able to do the
> >>>> same thing on taito, i.e. create ‘nlpl-pytorch’ as a virtual
> >>>> environment that includes the precompiled PyTorch wheel from your CSC
> >>>> colleagues?
> >>>
> >>> Yes, I guess that is the only sensible solution to not lose track
> >>> completely. In the meantime, how would this work for you all:
> >>>
> >>> [GPU-Env ~]$ module load python-env/intelpython3.6-2018.3
> >>> Loading application Intel Distribution for Python 2018 update 3
> >>> [GPU-Env ~]$ module list
> >>>
> >>> Currently Loaded Modules:
> >>> 1) gcc/4.9.3 2) cuda/7.5 3) StdEnv 4) git/2.17.1 5)
> >>> python-env/intelpython3.6-2018.3
> >>>
> >>> [GPU-Env ~]$ python3
> >>> Python 3.6.3 |Intel Corporation| (default, May 4 2018, 04:22:28)
> >>> [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
> >>> Type "help", "copyright", "credits" or "license" for more information.
> >>> Intel(R) Distribution for Python is brought to you by Intel
> Corporation.
> >>> Please check out: https://software.intel.com/en-us/python-distribution
> >>>>>> import torch
> >>>>>> torch.__version__
> >>> '0.4.1'
> >>>
> >>> Kudos to my colleagues Markus and Jarmo here.
> >>>
> >>> Martin
> >>>
> >>>>
> >>>> oe
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Sep 17, 2018 at 5:06 PM, Martin Matthiesen
> >>>> <martin.matthiesen at csc.fi> wrote:
> >>>>> Hello,
> >>>>>
> >>>>> We already have a way to use pytorch 0.4.1 on Taito-GPU:
> >>>>>
> >>>>> module load python-env/intelpython3.6-2018.3
> >>>>> [GPU-Env ~]$ pip install -v --user
> >>>>> /appl/opt/pytorch/0.4.1/cu90/torch-0.4.1-cp36-cp36m-linux_x86_64.whl
> >>>>>
> >>>>> One of my colleagues has compiled the module. Note that the module
> needs
> >>>>> python
> >>>>> 3.6 to work, the highest available on Taito-GPU.
> >>>>>
> >>>>> Before I investigate CPU-support or support for other compilers,
> would
> >>>>> this
> >>>>> pip-approach work for you?
> >>>>>
> >>>>> Regards,
> >>>>> Martin
> >>>>>
> >>>>> ----- Original Message -----
> >>>>>> From: "Stephan Oepen" <oe at ifi.uio.no>
> >>>>>> To: translation at nlpl.eu
> >>>>>> Cc: "infrastructure" <infrastructure at nlpl.eu>
> >>>>>> Sent: Saturday, 15 September, 2018 18:59:29
> >>>>>> Subject: [NLPL Task Force (A)] OpenNMT installation for NLPL (on
> Abel)
> >>>>>
> >>>>>> colleagues,
> >>>>>>
> >>>>>> joerg, martin, and i talked about getting the new release version of
> >>>>>> OpenNMT installed for NLPL. it appears it requires the most recent
> >>>>>> version of PyTorch, which currently is not available on Taito.
> martin
> >>>>>> will ask for it to be installed by CSC.
> >>>>>>
> >>>>>> in parallel, i believe i managed to put an NLPL-owned installation
> of
> >>>>>> the right PyTorch version onto Abel, please see:
> >>>>>>
> >>>>>> http://wiki.nlpl.eu/index.php/Infrastructure/software/pytorch
> >>>>>>
> >>>>>> before announcing this more widely, i would be grateful for some
> >>>>>> testing, in particular for both cpu and gpu usage. would anyone we
> >>>>>> readily set up to give this a shot on Abel?
> >>>>>>
> >>>>>> assuming our PyTorch is healthy, would someone from the helsinki
> team
> >>>>>> have the time to try and install OpenNMT onto Abel, e.g. as
> >>>>>>
> >>>>>> /projects/nlpl/software/opennmt-py/0.2.1
> >>>>>>
> >>>>>> there have been two relatively recent requests for OpenNMT in oslo
> >>>>>> (one of them for seq2seq dependency parsing :-), so i believe it
> would
> >>>>>> now be warranted to provide it on both systems.
> >>>>>>
> >>>>>> best wishes, oe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlpl.eu/archives/infrastructure/attachments/20180926/4ff6d2b9/attachment.htm>
More information about the infrastructure
mailing list