[NLPL Task Force (A)] rolling your own BERT (and maybe ELMo) on Saga

Andrei Kutuzov andreku at ifi.uio.no
Tue Jan 28 21:44:18 UTC 2020


Hi,

I will certainly not spend more than 20 minutes. Just covering quickly
how one can use and train ELMo models on Saga (and on Puhti, I guess)
using the NLPL modules. Also pointing out the differences between BERT
and ELMo, and reasons to choose one or another.

I do not plan any hands-on part, will just talk and show slides/code/etc :)

28.01.2020 22:17, Filip Ginter wrote:
> Hi
> 
> Just to confirm that we count on Andrei doing the ELMo stuff. Great!
> 
> Our overall plan. btw, is to try give people enough information to start
> working on their own bert based on the OSCAR dataset. Some data cleanup
> scripts will be provided too. I also plan to spend a while presenting
> the results which lead to us training our own bert, and some of the
> current results we have using it.
> 
> Things will unfortunately come together under a bit of a panic because
> we have a big shared task deadline in 12 days now, and much time sinks
> into that. :-|
> 
> Andrei, what are your plans for the ELMo part?
> 
> F
> 
> 
> On Mon, Jan 27, 2020 at 5:11 PM Antti Virtanen <sajvir at utu.fi
> <mailto:sajvir at utu.fi>> wrote:
> 
>> 
>     ​Here's a (quick and dirty) repo for the code we used to train
>     FinBERT: https://github.com/haamis/DeepLearningExamples_FinBERT/tree/master/TensorFlow/LanguageModeling/BERT_nonscaling.
>     This one has the sbatch files
>     used: https://github.com/haamis/BERT-pretraining​
> 
>     -Antti​
> 
> 
>     ------------------------------------------------------------------------
>     *From:* Stephan Oepen <oe at ifi.uio.no <mailto:oe at ifi.uio.no>>
>     *Sent:* Monday, January 27, 2020 4:48 PM
>     *To:* Antti Virtanen
>     *Cc:* Andrei Kutuzov; Filip Ginter; infrastructure
>     *Subject:* Re: rolling your own BERT (and maybe ELMo) on Saga
>      
>     thanks, antti!  we had previously put NLPL versions of TF and
>     Horovod on Saga, so possibly the trickier parts actually are already
>     in place :-).  once you have something resembling a sample
>     invocation (preferably on a smallish test case, say targetting 6
>     gpus on two nodes), i will be eager to test for you!  we have Puhti
>     access too, so if needbe i can try there or look up specific version
>     numbers ...
> 
>     cheers, oe
> 
> 
>     On Mon, 27 Jan 2020 at 15:39 Antti Virtanen <sajvir at utu.fi
>     <mailto:sajvir at utu.fi>> wrote:
> 
>         Hi,
> 
>         We used the tensorflow/1.13.1-hvd module on Puhti. As you might
>         figure out from the name it includes Tensorflow 1.13 and Horovod
>         0.16.4 plus any dependencies those have
>         (https://docs.csc.fi/apps/tensorflow/). I can give you a list of
>         packages in that module from Puhti if you wish. Also worthy of
>         note is that we had to create symlinks in the code directory to
>         cuda files `libdevice.10.bc` and `ptxas` to get XLA working
>         correctly, although I believe this is the fault of Puhti's
>         environment being misconfigured.
> 
>         -Antti
>         ________________________________________
>         From: Andrei Kutuzov <andreku at ifi.uio.no
>         <mailto:andreku at ifi.uio.no>>
>         Sent: Monday, January 27, 2020 4:15 PM
>         To: Stephan Oepen
>         Cc: Antti Virtanen; Filip Ginter; infrastructure
>         Subject: Re: rolling your own BERT (and maybe ELMo) on Saga
> 
>         No, I tried only multiple GPUs (up to 4) within the same node.
> 
>         27.01.2020 15:14, Stephan Oepen wrote:
>         > across multiple nodes?  oe
>         >
>         >
>         > On Mon, 27 Jan 2020 at 15:07 Andrei Kutuzov
>         <andreku at ifi.uio.no <mailto:andreku at ifi.uio.no>
>         > <mailto:andreku at ifi.uio.no <mailto:andreku at ifi.uio.no>>> wrote:
>         >
>         >     27.01.2020 14:29, Stephan Oepen wrote:
>         >     >> Antti can tell about the exact GPU stuff needed. We
>         will run the
>         >     tutorial on puhti since this is a tried and tested
>         environment for
>         >     us, and we have little time to prepare, so we play it
>         safe. But
>         >     Antti can tell what it takes to run the BERT code.
>         >     > yes, if possible, i could see myself try and replicate
>         your software
>         >     > environment on Saga ... the multi-gpu part sounds like
>         an interesting
>         >     > new challenge :-)!
>         >     Hi all,
>         >
>         >     Well, at least TensorFlow has no problems with multi-GPU
>         training on
>         >     Saga, works more or less out of the box.
>         >
>         >
>         >     --
>         >     Andrei
>         >     PhD Candidate at Language Technology Group (LTG)
>         >     University of Oslo
>         >
> 
> 
>         --
>         Andrei
>         PhD Candidate at Language Technology Group (LTG)
>         University of Oslo
> 


-- 
Andrei
PhD Candidate at Language Technology Group (LTG)
University of Oslo



More information about the infrastructure mailing list