[NLPL Task Force (A)] rolling your own BERT (and maybe ELMo) on Saga
Andrei Kutuzov
andreku at ifi.uio.no
Tue Jan 28 21:44:18 UTC 2020
Hi,
I will certainly not spend more than 20 minutes. Just covering quickly
how one can use and train ELMo models on Saga (and on Puhti, I guess)
using the NLPL modules. Also pointing out the differences between BERT
and ELMo, and reasons to choose one or another.
I do not plan any hands-on part, will just talk and show slides/code/etc :)
28.01.2020 22:17, Filip Ginter wrote:
> Hi
>
> Just to confirm that we count on Andrei doing the ELMo stuff. Great!
>
> Our overall plan. btw, is to try give people enough information to start
> working on their own bert based on the OSCAR dataset. Some data cleanup
> scripts will be provided too. I also plan to spend a while presenting
> the results which lead to us training our own bert, and some of the
> current results we have using it.
>
> Things will unfortunately come together under a bit of a panic because
> we have a big shared task deadline in 12 days now, and much time sinks
> into that. :-|
>
> Andrei, what are your plans for the ELMo part?
>
> F
>
>
> On Mon, Jan 27, 2020 at 5:11 PM Antti Virtanen <sajvir at utu.fi
> <mailto:sajvir at utu.fi>> wrote:
>
>
>
> Here's a (quick and dirty) repo for the code we used to train
> FinBERT: https://github.com/haamis/DeepLearningExamples_FinBERT/tree/master/TensorFlow/LanguageModeling/BERT_nonscaling.
> This one has the sbatch files
> used: https://github.com/haamis/BERT-pretraining
>
> -Antti
>
>
> ------------------------------------------------------------------------
> *From:* Stephan Oepen <oe at ifi.uio.no <mailto:oe at ifi.uio.no>>
> *Sent:* Monday, January 27, 2020 4:48 PM
> *To:* Antti Virtanen
> *Cc:* Andrei Kutuzov; Filip Ginter; infrastructure
> *Subject:* Re: rolling your own BERT (and maybe ELMo) on Saga
>
> thanks, antti! we had previously put NLPL versions of TF and
> Horovod on Saga, so possibly the trickier parts actually are already
> in place :-). once you have something resembling a sample
> invocation (preferably on a smallish test case, say targetting 6
> gpus on two nodes), i will be eager to test for you! we have Puhti
> access too, so if needbe i can try there or look up specific version
> numbers ...
>
> cheers, oe
>
>
> On Mon, 27 Jan 2020 at 15:39 Antti Virtanen <sajvir at utu.fi
> <mailto:sajvir at utu.fi>> wrote:
>
> Hi,
>
> We used the tensorflow/1.13.1-hvd module on Puhti. As you might
> figure out from the name it includes Tensorflow 1.13 and Horovod
> 0.16.4 plus any dependencies those have
> (https://docs.csc.fi/apps/tensorflow/). I can give you a list of
> packages in that module from Puhti if you wish. Also worthy of
> note is that we had to create symlinks in the code directory to
> cuda files `libdevice.10.bc` and `ptxas` to get XLA working
> correctly, although I believe this is the fault of Puhti's
> environment being misconfigured.
>
> -Antti
> ________________________________________
> From: Andrei Kutuzov <andreku at ifi.uio.no
> <mailto:andreku at ifi.uio.no>>
> Sent: Monday, January 27, 2020 4:15 PM
> To: Stephan Oepen
> Cc: Antti Virtanen; Filip Ginter; infrastructure
> Subject: Re: rolling your own BERT (and maybe ELMo) on Saga
>
> No, I tried only multiple GPUs (up to 4) within the same node.
>
> 27.01.2020 15:14, Stephan Oepen wrote:
> > across multiple nodes? oe
> >
> >
> > On Mon, 27 Jan 2020 at 15:07 Andrei Kutuzov
> <andreku at ifi.uio.no <mailto:andreku at ifi.uio.no>
> > <mailto:andreku at ifi.uio.no <mailto:andreku at ifi.uio.no>>> wrote:
> >
> > 27.01.2020 14:29, Stephan Oepen wrote:
> > >> Antti can tell about the exact GPU stuff needed. We
> will run the
> > tutorial on puhti since this is a tried and tested
> environment for
> > us, and we have little time to prepare, so we play it
> safe. But
> > Antti can tell what it takes to run the BERT code.
> > > yes, if possible, i could see myself try and replicate
> your software
> > > environment on Saga ... the multi-gpu part sounds like
> an interesting
> > > new challenge :-)!
> > Hi all,
> >
> > Well, at least TensorFlow has no problems with multi-GPU
> training on
> > Saga, works more or less out of the box.
> >
> >
> > --
> > Andrei
> > PhD Candidate at Language Technology Group (LTG)
> > University of Oslo
> >
>
>
> --
> Andrei
> PhD Candidate at Language Technology Group (LTG)
> University of Oslo
>
--
Andrei
PhD Candidate at Language Technology Group (LTG)
University of Oslo
More information about the infrastructure
mailing list