[NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)

Wed Nov 28 14:37:34 UTC 2018

Hi,

I did my OpenNMT-py experiments on both Abel and Taito.

On Taito, I got training speeds of about 13000 tokens/s, on Abel it was about 4000 tokens/s.

A colleague who used an independent OpenNMT-py module on Taito-GPU during the summer obtained about 9000 tokens/s with a different dataset.

I also just started a CPU-only training run on Taito, which got around 1000 tokens/s.

This leads me to believe that my experiments – at least those on Taito – did use the GPU…

Best,

Yves

________________________________
From: Stephan Oepen <oe at ifi.uio.no>
Sent: Wednesday, November 28, 2018 4:08:46 PM
To: Scherrer, Yves
Cc: Martin Matthiesen; infrastructure
Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)

as for the OpenNMT-py experiments, did you do those on Abel or Taito,
or both?  using gpus on Taito?  in other words, do you believe that
OpenNMT-py (in contrast to PyTorch) works on Taito gpu nodes?

oe

On Wed, Nov 28, 2018 at 2:47 PM Scherrer, Yves
<yves.scherrer at helsinki.fi> wrote:
>
> Hi,
>
>
>
> I’m following up on this one with a related issue. I am testing PyTorch independently of OpenNMT-py, but cannot get it to run on (Taito-)GPU.
>
>
>
> Specifically, although I was logged in to Taito-GPU, I cannot get the test script described on the Wiki page to return True:
>
>
>
> [GPU-Env lstmtagger]$ srun -n 1 -p gputest --gres=gpu:k80:1 --mem 1G -t 15 --pty python3 /proj/nlpl/software/pytorch/0.4.1/test.py
>
> srun: job 32089470 queued and waiting for resources
>
> srun: job 32089470 has been allocated resources
>
> False
>
>
>
> I also get ‘False’ when running the following script through sbatch:
>
>
>
> #SBATCH -J cudatest
>
> #SBATCH -o cudatest.%j.out
>
> #SBATCH -e cudatest.%j.err
>
> #SBATCH -t 0:05:00
>
> #SBATCH -p gputest
>
> #SBATCH -N 1
>
> #SBATCH --gres=gpu:k80:1
>
> #SBATCH --mem=1g
>
> module use -a /proj/nlpl/software/modulefiles/
>
> module load nlpl-pytorch
>
> srun python3 /proj/nlpl/software/pytorch/0.4.1/test.py
>
>
>
> Has there been any change lately? Or am I missing something obvious?
>
>
>
> Best,
>
> Yves
>
>
>
>
>
> ________________________________
> From: Stephan Oepen <oe at ifi.uio.no>
> Sent: Wednesday, September 26, 2018 11:10:12 PM
> To: Scherrer, Yves
> Cc: Martin Matthiesen; infrastructure
> Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)
>
> hi again,
>
> > i actually had a go at my own glibc and PyTorch installations on Taito, but
> > so far gpu support is evasive.
>
> actually, with a little more tinkering, i now believe i might have a
> working installation of PyTorch 0.4.1 and OpenNMT-py 0.2.1 on Taito
> too, seemingly functional on both cpu and gpu nodes:
>
> [oe at taito-login4 ~]$ module purge
> [oe at taito-login4 ~]$ module load nlpl-opennmt-py
> Loading application python-3.5.3 environment with needed modules
> [oe at taito-login4 ~]$ module list
>
> Currently Loaded Modules:
>   1) gcc/5.4.0   2) intelmpi/5.1.3   3) mkl/11.3.2   4) python/3.5.3
> 5) python-env/3.5.3   6) nlpl-pytorch/0.4.1   7) nlpl-opennmt-py/0.2.1
>
> [oe at taito-login4 ~]$ type -all python
> python is /proj/nlpl/software/opennmt-py/0.2.1/bin/python
> python is /proj/nlpl/software/pytorch/0.4.1/bin/python
> python is /appl/opt/python/3.5.3-gnu540/bin/python
> python is /usr/bin/python
> [oe at taito-login4 ~]$ python -c "import torch; import onmt;
> print(torch.cuda.is_available());"
> False
>
> [oe at taito-login4 ~]$ srun -n 1 -p gputest --gres=gpu:k80:1 --mem 1G -t
> 15 --pty \
>   python -c "import torch; import onmt; print(torch.cuda.is_available());"
> True
>
> —yves (or joerg), i would have a hard time testing things in much more
> depth.  any chance you would have some time to try and replicate the
> validation steps your are currently running on Abel on Taito too?
>
> with a sense of accomplishment :-), oe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlpl.eu/archives/infrastructure/attachments/20181128/26a16afc/attachment.htm>