[NLPL Task Force (A)] Trouble with using GPU in Cupy package

Sat May 11 20:26:33 UTC 2019

Hi Stephan,

Thanks for this.

I hadn't module loaded Cuda in my scripts.  I'm going back and looking through some of the instructions now.  So I guess when using CUDA, it is required to module load cuda/[version]?  I've done as follows:

module load cuda/8.0
module load nlpl-cupy

When I try module load nlpl-cupy I get the following message:

ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'nlpl-cupy'

When I use module -h avail I also don't see nlpl-cupy there.

However, despite that error message it seems (going by my slurm output) that the problem is no longer cupy failing to connect to the CUDA:

-bash-4.1$ cat slurm-26960146.out

ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'nlpl-cupy'

Traceback (most recent call last):

  File "/usit/abel/u1/andidyer/vecmap/map_embeddings.py", line 422, in <module>

    main()

  File "/usit/abel/u1/andidyer/vecmap/map_embeddings.py", line 148, in main

    trg_words, z = embeddings.read(trgfile, dtype=dtype)

  File "/cluster/home/andidyer/vecmap/embeddings.py", line 35, in read

    matrix[i] = np.fromstring(vec, sep=' ', dtype=dtype)

ValueError: could not broadcast input array from shape (125) into shape (300)

So something is obviously going right!

Again, many thanks for your assistance.

Best wishes,

Andrew

________________________________
From: Stephan Oepen <oe at ifi.uio.no>
Sent: 09 May 2019 18:17
To: Andrew Dyer
Cc: infrastructure at nlpl.eu
Subject: Re: [NLPL Task Force (A)] Trouble with using GPU in Cupy package

hi andrew,

it appears your local cupy installation is somewhat unable to find its
external dependencies.  have you 'module load'ed the right CUDA
version?  are you sure it ended up running on a gpu node?

this things can be tricky to sort out, what with the many different
(and mutually incompatible) module versions available on a large and
old system like Abel.

CuPy looks like a relevant tool for the NLPL software inventory, so i
installed it as an NLPL module.  the following (when running on a gpu
node) appears to work:

[oe at compute-19-1 ~]$ module purge; module load nlpl-cupy
module list
[oe at compute-19-1 ~]$ module list
Currently Loaded Modulefiles:
  1) intel/2019.0             4) gcc/4.9.2                7)
nlpl-cython/0.29.3/3.7
  2) openssl.intel/1_1_1      5) cuda/9.0                 8)
nlpl-scipy/201901/3.7
  3) python3/3.7.0            6) nlpl-numpy/1.16.0/3.7    9) nlpl-cupy/5.4.0/3.7
[oe at compute-19-1 ~]$ python3 -c "import cupy; print(cupy.__version__);"
5.4.0

in general, i would suggest testing things interactive first, before
you invest the time in putting a job in the queue.  these past few
days, it appears there can be fairly long wait times for gpu nodes on
Abel (we are really looking foward to transitioning to the new system
after the summer).  but in principle, one can create an interactive
session on a gpu node as follows:

qlogin --account=nn9447k --time=00:30:00 --mem-per-cpu=2048M
--partition=accel --gres=gpu:1

please see whether the new NLPL version of CuPy works for you (but
please make sure there are no unwanted interactions with your local
virtualenv)?

best wishes, oe

On Mon, May 6, 2019 at 2:27 PM Andrew Dyer
<Andrew.Dyer.6854 at student.uu.se> wrote:
>
> Hi,
>
> Apologies for the bother.  I'm currently trying to run an experiment using GPU nodes in Abel.  The Python program that I am using uses Cupy, which I have installed in my venv with pip.  On my sbatch script, I set the GPU request as instructed on the Job Scripts page:
>
> #SBATCH --partition=accel --gres=gpu:1
>
> However, the program that I'm using seems to be having trouble connecting to the CUDA software.  I've checked that the versions match (8.0).  I'm at a loss for what else to do though, so any help you can provide would be appreciated.
>
> For reference, see attached my script and the error message in the slurm output.
>
> Many thanks,
>
> Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlpl.eu/archives/infrastructure/attachments/20190511/749c227c/attachment.htm>