[NLPL Task Force (A)] rolling your own BERT (and maybe ELMo) on Saga

Antti Virtanen sajvir at utu.fi
Mon Jan 27 14:39:34 UTC 2020


Hi,

We used the tensorflow/1.13.1-hvd module on Puhti. As you might figure out from the name it includes Tensorflow 1.13 and Horovod 0.16.4 plus any dependencies those have (https://docs.csc.fi/apps/tensorflow/). I can give you a list of packages in that module from Puhti if you wish. Also worthy of note is that we had to create symlinks in the code directory to cuda files `libdevice.10.bc` and `ptxas` to get XLA working correctly, although I believe this is the fault of Puhti's environment being misconfigured.

-Antti
________________________________________
From: Andrei Kutuzov <andreku at ifi.uio.no>
Sent: Monday, January 27, 2020 4:15 PM
To: Stephan Oepen
Cc: Antti Virtanen; Filip Ginter; infrastructure
Subject: Re: rolling your own BERT (and maybe ELMo) on Saga

No, I tried only multiple GPUs (up to 4) within the same node.

27.01.2020 15:14, Stephan Oepen wrote:
> across multiple nodes?  oe
>
>
> On Mon, 27 Jan 2020 at 15:07 Andrei Kutuzov <andreku at ifi.uio.no
> <mailto:andreku at ifi.uio.no>> wrote:
>
>     27.01.2020 14:29, Stephan Oepen wrote:
>     >> Antti can tell about the exact GPU stuff needed. We will run the
>     tutorial on puhti since this is a tried and tested environment for
>     us, and we have little time to prepare, so we play it safe. But
>     Antti can tell what it takes to run the BERT code.
>     > yes, if possible, i could see myself try and replicate your software
>     > environment on Saga ... the multi-gpu part sounds like an interesting
>     > new challenge :-)!
>     Hi all,
>
>     Well, at least TensorFlow has no problems with multi-GPU training on
>     Saga, works more or less out of the box.
>
>
>     --
>     Andrei
>     PhD Candidate at Language Technology Group (LTG)
>     University of Oslo
>


--
Andrei
PhD Candidate at Language Technology Group (LTG)
University of Oslo




More information about the infrastructure mailing list