<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <style type="text/css" style="display:none"></style> </head> <body dir="ltr" style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;"> <p></p> <p style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 16px;">Here's a (quick and dirty) repo for the code we used to train FinBERT: <a href="https://github.com/haamis/DeepLearningExamples_FinBERT/tree/master/TensorFlow/LanguageModeling/BERT_nonscaling">https://github.com/haamis/DeepLearningExamples_FinBERT/tree/master/TensorFlow/LanguageModeling/BERT_nonscaling</a>. This one has the sbatch files used: <a href="https://github.com/haamis/BERT-pretraining">https://github.com/haamis/BERT-pretraining</a><br> <br> </p> <p style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 16px;">-Antti<br> </p> <p><br> </p> <div style="color: rgb(33, 33, 33);"> <hr tabindex="-1" style="display:inline-block; width:98%"> <div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Stephan Oepen <oe@ifi.uio.no><br> <b>Sent:</b> Monday, January 27, 2020 4:48 PM<br> <b>To:</b> Antti Virtanen<br> <b>Cc:</b> Andrei Kutuzov; Filip Ginter; infrastructure<br> <b>Subject:</b> Re: rolling your own BERT (and maybe ELMo) on Saga</font> <div> </div> </div> <div> <div> <div dir="auto">thanks, antti! we had previously put NLPL versions of TF and Horovod on Saga, so possibly the trickier parts actually are already in place :-). once you have something resembling a sample invocation (preferably on a smallish test case, say targetting 6 gpus on two nodes), i will be eager to test for you! we have Puhti access too, so if needbe i can try there or look up specific version numbers ...</div> </div> <div dir="auto"><br> </div> <div dir="auto">cheers, oe</div> <div dir="auto"><br> </div> <div><br> <div class="gmail_quote"> <div dir="ltr" class="gmail_attr">On Mon, 27 Jan 2020 at 15:39 Antti Virtanen <<a href="mailto:sajvir@utu.fi">sajvir@utu.fi</a>> wrote:<br> </div> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex"> Hi,<br> <br> We used the tensorflow/1.13.1-hvd module on Puhti. As you might figure out from the name it includes Tensorflow 1.13 and Horovod 0.16.4 plus any dependencies those have (<a href="https://docs.csc.fi/apps/tensorflow/" rel="noreferrer" target="_blank">https://docs.csc.fi/apps/tensorflow/</a>). I can give you a list of packages in that module from Puhti if you wish. Also worthy of note is that we had to create symlinks in the code directory to cuda files `libdevice.10.bc` and `ptxas` to get XLA working correctly, although I believe this is the fault of Puhti's environment being misconfigured.<br> <br> -Antti<br> ________________________________________<br> From: Andrei Kutuzov <<a href="mailto:andreku@ifi.uio.no" target="_blank">andreku@ifi.uio.no</a>><br> Sent: Monday, January 27, 2020 4:15 PM<br> To: Stephan Oepen<br> Cc: Antti Virtanen; Filip Ginter; infrastructure<br> Subject: Re: rolling your own BERT (and maybe ELMo) on Saga<br> <br> No, I tried only multiple GPUs (up to 4) within the same node.<br> <br> 27.01.2020 15:14, Stephan Oepen wrote:<br> > across multiple nodes? oe<br> ><br> ><br> > On Mon, 27 Jan 2020 at 15:07 Andrei Kutuzov <<a href="mailto:andreku@ifi.uio.no" target="_blank">andreku@ifi.uio.no</a><br> > <mailto:<a href="mailto:andreku@ifi.uio.no" target="_blank">andreku@ifi.uio.no</a>>> wrote:<br> ><br> > 27.01.2020 14:29, Stephan Oepen wrote:<br> > >> Antti can tell about the exact GPU stuff needed. We will run the<br> > tutorial on puhti since this is a tried and tested environment for<br> > us, and we have little time to prepare, so we play it safe. But<br> > Antti can tell what it takes to run the BERT code.<br> > > yes, if possible, i could see myself try and replicate your software<br> > > environment on Saga ... the multi-gpu part sounds like an interesting<br> > > new challenge :-)!<br> > Hi all,<br> ><br> > Well, at least TensorFlow has no problems with multi-GPU training on<br> > Saga, works more or less out of the box.<br> ><br> ><br> > --<br> > Andrei<br> > PhD Candidate at Language Technology Group (LTG)<br> > University of Oslo<br> ><br> <br> <br> --<br> Andrei<br> PhD Candidate at Language Technology Group (LTG)<br> University of Oslo<br> </blockquote> </div> </div> </div> </div> </body> </html>