[NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)
Scherrer, Yves
yves.scherrer at helsinki.fi
Wed Nov 28 13:47:05 UTC 2018
Hi,
I’m following up on this one with a related issue. I am testing PyTorch independently of OpenNMT-py, but cannot get it to run on (Taito-)GPU.
Specifically, although I was logged in to Taito-GPU, I cannot get the test script described on the Wiki page to return True:
[GPU-Env lstmtagger]$ srun -n 1 -p gputest --gres=gpu:k80:1 --mem 1G -t 15 --pty python3 /proj/nlpl/software/pytorch/0.4.1/test.py
srun: job 32089470 queued and waiting for resources
srun: job 32089470 has been allocated resources
False
I also get ‘False’ when running the following script through sbatch:
#SBATCH -J cudatest
#SBATCH -o cudatest.%j.out
#SBATCH -e cudatest.%j.err
#SBATCH -t 0:05:00
#SBATCH -p gputest
#SBATCH -N 1
#SBATCH --gres=gpu:k80:1
#SBATCH --mem=1g
module use -a /proj/nlpl/software/modulefiles/
module load nlpl-pytorch
srun python3 /proj/nlpl/software/pytorch/0.4.1/test.py
Has there been any change lately? Or am I missing something obvious?
Best,
Yves
________________________________
From: Stephan Oepen <oe at ifi.uio.no>
Sent: Wednesday, September 26, 2018 11:10:12 PM
To: Scherrer, Yves
Cc: Martin Matthiesen; infrastructure
Subject: Re: [NLPL Task Force (A)] OpenNMT installation for NLPL (on Abel)
hi again,
> i actually had a go at my own glibc and PyTorch installations on Taito, but
> so far gpu support is evasive.
actually, with a little more tinkering, i now believe i might have a
working installation of PyTorch 0.4.1 and OpenNMT-py 0.2.1 on Taito
too, seemingly functional on both cpu and gpu nodes:
[oe at taito-login4 ~]$ module purge
[oe at taito-login4 ~]$ module load nlpl-opennmt-py
Loading application python-3.5.3 environment with needed modules
[oe at taito-login4 ~]$ module list
Currently Loaded Modules:
1) gcc/5.4.0 2) intelmpi/5.1.3 3) mkl/11.3.2 4) python/3.5.3
5) python-env/3.5.3 6) nlpl-pytorch/0.4.1 7) nlpl-opennmt-py/0.2.1
[oe at taito-login4 ~]$ type -all python
python is /proj/nlpl/software/opennmt-py/0.2.1/bin/python
python is /proj/nlpl/software/pytorch/0.4.1/bin/python
python is /appl/opt/python/3.5.3-gnu540/bin/python
python is /usr/bin/python
[oe at taito-login4 ~]$ python -c "import torch; import onmt;
print(torch.cuda.is_available());"
False
[oe at taito-login4 ~]$ srun -n 1 -p gputest --gres=gpu:k80:1 --mem 1G -t
15 --pty \
python -c "import torch; import onmt; print(torch.cuda.is_available());"
True
—yves (or joerg), i would have a hard time testing things in much more
depth. any chance you would have some time to try and replicate the
validation steps your are currently running on Abel on Taito too?
with a sense of accomplishment :-), oe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlpl.eu/archives/infrastructure/attachments/20181128/adbc8ba6/attachment.htm>
More information about the infrastructure
mailing list