<div dir="ltr"><div>Hi</div><div><br></div><div>Just to confirm that we count on Andrei doing the ELMo stuff. Great!</div><div><br></div><div>Our overall plan. btw, is to try give people enough information to start working on their own bert based on the OSCAR dataset. Some data cleanup scripts will be provided too. I also plan to spend a while presenting the results which lead to us training our own bert, and some of the current results we have using it.<br></div><div><br></div><div> Things will unfortunately come together under a bit of a panic because we have a big shared task deadline in 12 days now, and much time sinks into that. :-| <br></div><div><br></div><div>Andrei, what are your plans for the ELMo part?<br></div><div><br></div><div>F</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 27, 2020 at 5:11 PM Antti Virtanen <<a href="mailto:sajvir@utu.fi">sajvir@utu.fi</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div dir="ltr" style="font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255);font-family:Calibri,Arial,Helvetica,sans-serif"> <p></p> <p style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16px">Here's a (quick and dirty) repo for the code we used to train FinBERT: <a href="https://github.com/haamis/DeepLearningExamples_FinBERT/tree/master/TensorFlow/LanguageModeling/BERT_nonscaling" target="_blank">https://github.com/haamis/DeepLearningExamples_FinBERT/tree/master/TensorFlow/LanguageModeling/BERT_nonscaling</a>. This one has the sbatch files used: <a href="https://github.com/haamis/BERT-pretraining" target="_blank">https://github.com/haamis/BERT-pretraining</a><br> <br> </p> <p style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16px">-Antti<br> </p> <p><br> </p> <div style="color:rgb(33,33,33)"> <hr style="display:inline-block;width:98%"> <div id="gmail-m_-7360997688940956170divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Stephan Oepen <<a href="mailto:oe@ifi.uio.no" target="_blank">oe@ifi.uio.no</a>><br> <b>Sent:</b> Monday, January 27, 2020 4:48 PM<br> <b>To:</b> Antti Virtanen<br> <b>Cc:</b> Andrei Kutuzov; Filip Ginter; infrastructure<br> <b>Subject:</b> Re: rolling your own BERT (and maybe ELMo) on Saga</font> <div> </div> </div> <div> <div> <div dir="auto">thanks, antti! we had previously put NLPL versions of TF and Horovod on Saga, so possibly the trickier parts actually are already in place :-). once you have something resembling a sample invocation (preferably on a smallish test case, say targetting 6 gpus on two nodes), i will be eager to test for you! we have Puhti access too, so if needbe i can try there or look up specific version numbers ...</div> </div> <div dir="auto"><br> </div> <div dir="auto">cheers, oe</div> <div dir="auto"><br> </div> <div><br> <div class="gmail_quote"> <div dir="ltr" class="gmail_attr">On Mon, 27 Jan 2020 at 15:39 Antti Virtanen <<a href="mailto:sajvir@utu.fi" target="_blank">sajvir@utu.fi</a>> wrote:<br> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Hi,<br> <br> We used the tensorflow/1.13.1-hvd module on Puhti. As you might figure out from the name it includes Tensorflow 1.13 and Horovod 0.16.4 plus any dependencies those have (<a href="https://docs.csc.fi/apps/tensorflow/" rel="noreferrer" target="_blank">https://docs.csc.fi/apps/tensorflow/</a>). I can give you a list of packages in that module from Puhti if you wish. Also worthy of note is that we had to create symlinks in the code directory to cuda files `libdevice.10.bc` and `ptxas` to get XLA working correctly, although I believe this is the fault of Puhti's environment being misconfigured.<br> <br> -Antti<br> ________________________________________<br> From: Andrei Kutuzov <<a href="mailto:andreku@ifi.uio.no" target="_blank">andreku@ifi.uio.no</a>><br> Sent: Monday, January 27, 2020 4:15 PM<br> To: Stephan Oepen<br> Cc: Antti Virtanen; Filip Ginter; infrastructure<br> Subject: Re: rolling your own BERT (and maybe ELMo) on Saga<br> <br> No, I tried only multiple GPUs (up to 4) within the same node.<br> <br> 27.01.2020 15:14, Stephan Oepen wrote:<br> > across multiple nodes? oe<br> ><br> ><br> > On Mon, 27 Jan 2020 at 15:07 Andrei Kutuzov <<a href="mailto:andreku@ifi.uio.no" target="_blank">andreku@ifi.uio.no</a><br> > <mailto:<a href="mailto:andreku@ifi.uio.no" target="_blank">andreku@ifi.uio.no</a>>> wrote:<br> ><br> > 27.01.2020 14:29, Stephan Oepen wrote:<br> > >> Antti can tell about the exact GPU stuff needed. We will run the<br> > tutorial on puhti since this is a tried and tested environment for<br> > us, and we have little time to prepare, so we play it safe. But<br> > Antti can tell what it takes to run the BERT code.<br> > > yes, if possible, i could see myself try and replicate your software<br> > > environment on Saga ... the multi-gpu part sounds like an interesting<br> > > new challenge :-)!<br> > Hi all,<br> ><br> > Well, at least TensorFlow has no problems with multi-GPU training on<br> > Saga, works more or less out of the box.<br> ><br> ><br> > --<br> > Andrei<br> > PhD Candidate at Language Technology Group (LTG)<br> > University of Oslo<br> ><br> <br> <br> --<br> Andrei<br> PhD Candidate at Language Technology Group (LTG)<br> University of Oslo<br> </blockquote> </div> </div> </div> </div> </div> </blockquote></div>