[NLPL Task Force (A)] rolling your own BERT (and maybe ELMo) on Saga

Stephan Oepen oe at ifi.uio.no
Mon Jan 27 14:48:22 UTC 2020


thanks, antti!  we had previously put NLPL versions of TF and Horovod on
Saga, so possibly the trickier parts actually are already in place :-).
 once you have something resembling a sample invocation (preferably on a
smallish test case, say targetting 6 gpus on two nodes), i will be eager to
test for you!  we have Puhti access too, so if needbe i can try there or
look up specific version numbers ...

cheers, oe


On Mon, 27 Jan 2020 at 15:39 Antti Virtanen <sajvir at utu.fi> wrote:

> Hi,
>
> We used the tensorflow/1.13.1-hvd module on Puhti. As you might figure out
> from the name it includes Tensorflow 1.13 and Horovod 0.16.4 plus any
> dependencies those have (https://docs.csc.fi/apps/tensorflow/). I can
> give you a list of packages in that module from Puhti if you wish. Also
> worthy of note is that we had to create symlinks in the code directory to
> cuda files `libdevice.10.bc` and `ptxas` to get XLA working correctly,
> although I believe this is the fault of Puhti's environment being
> misconfigured.
>
> -Antti
> ________________________________________
> From: Andrei Kutuzov <andreku at ifi.uio.no>
> Sent: Monday, January 27, 2020 4:15 PM
> To: Stephan Oepen
> Cc: Antti Virtanen; Filip Ginter; infrastructure
> Subject: Re: rolling your own BERT (and maybe ELMo) on Saga
>
> No, I tried only multiple GPUs (up to 4) within the same node.
>
> 27.01.2020 15:14, Stephan Oepen wrote:
> > across multiple nodes?  oe
> >
> >
> > On Mon, 27 Jan 2020 at 15:07 Andrei Kutuzov <andreku at ifi.uio.no
> > <mailto:andreku at ifi.uio.no>> wrote:
> >
> >     27.01.2020 14:29, Stephan Oepen wrote:
> >     >> Antti can tell about the exact GPU stuff needed. We will run the
> >     tutorial on puhti since this is a tried and tested environment for
> >     us, and we have little time to prepare, so we play it safe. But
> >     Antti can tell what it takes to run the BERT code.
> >     > yes, if possible, i could see myself try and replicate your
> software
> >     > environment on Saga ... the multi-gpu part sounds like an
> interesting
> >     > new challenge :-)!
> >     Hi all,
> >
> >     Well, at least TensorFlow has no problems with multi-GPU training on
> >     Saga, works more or less out of the box.
> >
> >
> >     --
> >     Andrei
> >     PhD Candidate at Language Technology Group (LTG)
> >     University of Oslo
> >
>
>
> --
> Andrei
> PhD Candidate at Language Technology Group (LTG)
> University of Oslo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlpl.eu/archives/infrastructure/attachments/20200127/7550602c/attachment.htm>


More information about the infrastructure mailing list