<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=Windows-1252"> <meta name="Generator" content="Microsoft Exchange Server"> <style></style> </head> <body> <style>  </style> <div lang="EN-US" link="blue" vlink="#954F72"> <div class="x_WordSection1"> Hi, I did my OpenNMT-py experiments on both Abel and Taito. On Taito, I got training speeds of about 13000 tokens/s, on Abel it was about 4000 tokens/s. A colleague who used an independent OpenNMT-py module on Taito-GPU during the summer obtained about 9000 tokens/s with a different dataset. I also just started a CPU-only training run on Taito, which got around 1000 tokens/s. This leads me to believe that my experiments – at least those on Taito – did use the GPU… Best, Yves </div> <hr tabindex="-1" style="display:inline-block; width:98%"> <div id="x_divRplyFwdMsg" dir="ltr">From: Stephan Oepen <oe@ifi.uio.no> Sent: Wednesday, November 28, 2018 4:08:46 PM To: Scherrer, Yves Cc: Martin Matthiesen; infrastructure Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel) <div> </div> </div> </div> <div class="PlainText">as for the OpenNMT-py experiments, did you do those on Abel or Taito, or both? using gpus on Taito? in other words, do you believe that OpenNMT-py (in contrast to PyTorch) works on Taito gpu nodes? oe On Wed, Nov 28, 2018 at 2:47 PM Scherrer, Yves <yves.scherrer@helsinki.fi> wrote: > > Hi, > > > > I’m following up on this one with a related issue. I am testing PyTorch independently of OpenNMT-py, but cannot get it to run on (Taito-)GPU. > > > > Specifically, although I was logged in to Taito-GPU, I cannot get the test script described on the Wiki page to return True: > > > > [GPU-Env lstmtagger]$ srun -n 1 -p gputest --gres=gpu:k80:1 --mem 1G -t 15 --pty python3 /proj/nlpl/software/pytorch/0.4.1/test.py > > srun: job 32089470 queued and waiting for resources > > srun: job 32089470 has been allocated resources > > False > > > > I also get ‘False’ when running the following script through sbatch: > > > > #SBATCH -J cudatest > > #SBATCH -o cudatest.%j.out > > #SBATCH -e cudatest.%j.err > > #SBATCH -t 0:05:00 > > #SBATCH -p gputest > > #SBATCH -N 1 > > #SBATCH --gres=gpu:k80:1 > > #SBATCH --mem=1g > > module use -a /proj/nlpl/software/modulefiles/ > > module load nlpl-pytorch > > srun python3 /proj/nlpl/software/pytorch/0.4.1/test.py > > > > Has there been any change lately? Or am I missing something obvious? > > > > Best, > > Yves > > > > > > ________________________________ > From: Stephan Oepen <oe@ifi.uio.no> > Sent: Wednesday, September 26, 2018 11:10:12 PM > To: Scherrer, Yves > Cc: Martin Matthiesen; infrastructure > Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel) > > hi again, > > > i actually had a go at my own glibc and PyTorch installations on Taito, but > > so far gpu support is evasive. > > actually, with a little more tinkering, i now believe i might have a > working installation of PyTorch 0.4.1 and OpenNMT-py 0.2.1 on Taito > too, seemingly functional on both cpu and gpu nodes: > > [oe@taito-login4 ~]$ module purge > [oe@taito-login4 ~]$ module load nlpl-opennmt-py > Loading application python-3.5.3 environment with needed modules > [oe@taito-login4 ~]$ module list > > Currently Loaded Modules: > 1) gcc/5.4.0 2) intelmpi/5.1.3 3) mkl/11.3.2 4) python/3.5.3 > 5) python-env/3.5.3 6) nlpl-pytorch/0.4.1 7) nlpl-opennmt-py/0.2.1 > > [oe@taito-login4 ~]$ type -all python > python is /proj/nlpl/software/opennmt-py/0.2.1/bin/python > python is /proj/nlpl/software/pytorch/0.4.1/bin/python > python is /appl/opt/python/3.5.3-gnu540/bin/python > python is /usr/bin/python > [oe@taito-login4 ~]$ python -c "import torch; import onmt; > print(torch.cuda.is_available());" > False > > [oe@taito-login4 ~]$ srun -n 1 -p gputest --gres=gpu:k80:1 --mem 1G -t > 15 --pty \ > python -c "import torch; import onmt; print(torch.cuda.is_available());" > True > > —yves (or joerg), i would have a hard time testing things in much more > depth. any chance you would have some time to try and replicate the > validation steps your are currently running on Abel on Taito too? > > with a sense of accomplishment :-), oe </div> </body> </html>