[NLPL Task Force (A)] rolling your own BERT (and maybe ELMo) on Saga
Antti Virtanen
sajvir at utu.fi
Mon Jan 27 14:39:34 UTC 2020
Hi,
We used the tensorflow/1.13.1-hvd module on Puhti. As you might figure out from the name it includes Tensorflow 1.13 and Horovod 0.16.4 plus any dependencies those have (https://docs.csc.fi/apps/tensorflow/). I can give you a list of packages in that module from Puhti if you wish. Also worthy of note is that we had to create symlinks in the code directory to cuda files `libdevice.10.bc` and `ptxas` to get XLA working correctly, although I believe this is the fault of Puhti's environment being misconfigured.
-Antti
________________________________________
From: Andrei Kutuzov <andreku at ifi.uio.no>
Sent: Monday, January 27, 2020 4:15 PM
To: Stephan Oepen
Cc: Antti Virtanen; Filip Ginter; infrastructure
Subject: Re: rolling your own BERT (and maybe ELMo) on Saga
No, I tried only multiple GPUs (up to 4) within the same node.
27.01.2020 15:14, Stephan Oepen wrote:
> across multiple nodes? oe
>
>
> On Mon, 27 Jan 2020 at 15:07 Andrei Kutuzov <andreku at ifi.uio.no
> <mailto:andreku at ifi.uio.no>> wrote:
>
> 27.01.2020 14:29, Stephan Oepen wrote:
> >> Antti can tell about the exact GPU stuff needed. We will run the
> tutorial on puhti since this is a tried and tested environment for
> us, and we have little time to prepare, so we play it safe. But
> Antti can tell what it takes to run the BERT code.
> > yes, if possible, i could see myself try and replicate your software
> > environment on Saga ... the multi-gpu part sounds like an interesting
> > new challenge :-)!
> Hi all,
>
> Well, at least TensorFlow has no problems with multi-GPU training on
> Saga, works more or less out of the box.
>
>
> --
> Andrei
> PhD Candidate at Language Technology Group (LTG)
> University of Oslo
>
--
Andrei
PhD Candidate at Language Technology Group (LTG)
University of Oslo
More information about the infrastructure
mailing list