<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none"><!-- p { margin-top: 0px; margin-bottom: 0px; }--></style>
</head>
<body dir="ltr" style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;">
<p></p>
<p style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 16px;">Here's a (quick and dirty) repo for the code we used to train FinBERT: <a href="https://github.com/haamis/DeepLearningExamples_FinBERT/tree/master/TensorFlow/LanguageModeling/BERT_nonscaling">https://github.com/haamis/DeepLearningExamples_FinBERT/tree/master/TensorFlow/LanguageModeling/BERT_nonscaling</a>.
This one has the sbatch files used: <a href="https://github.com/haamis/BERT-pretraining">https://github.com/haamis/BERT-pretraining</a><br>
<br>
</p>
<p style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 16px;">-Antti<br>
</p>
<p><br>
</p>
<div style="color: rgb(33, 33, 33);">
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Stephan Oepen <oe@ifi.uio.no><br>
<b>Sent:</b> Monday, January 27, 2020 4:48 PM<br>
<b>To:</b> Antti Virtanen<br>
<b>Cc:</b> Andrei Kutuzov; Filip Ginter; infrastructure<br>
<b>Subject:</b> Re: rolling your own BERT (and maybe ELMo) on Saga</font>
<div> </div>
</div>
<div>
<div>
<div dir="auto">thanks, antti! we had previously put NLPL versions of TF and Horovod on Saga, so possibly the trickier parts actually are already in place :-). once you have something resembling a sample invocation (preferably on a smallish test case, say
targetting 6 gpus on two nodes), i will be eager to test for you! we have Puhti access too, so if needbe i can try there or look up specific version numbers ...</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">cheers, oe</div>
<div dir="auto"><br>
</div>
<div><br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, 27 Jan 2020 at 15:39 Antti Virtanen <<a href="mailto:sajvir@utu.fi">sajvir@utu.fi</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">
Hi,<br>
<br>
We used the tensorflow/1.13.1-hvd module on Puhti. As you might figure out from the name it includes Tensorflow 1.13 and Horovod 0.16.4 plus any dependencies those have (<a href="https://docs.csc.fi/apps/tensorflow/" rel="noreferrer" target="_blank">https://docs.csc.fi/apps/tensorflow/</a>).
I can give you a list of packages in that module from Puhti if you wish. Also worthy of note is that we had to create symlinks in the code directory to cuda files `libdevice.10.bc` and `ptxas` to get XLA working correctly, although I believe this is the fault
of Puhti's environment being misconfigured.<br>
<br>
-Antti<br>
________________________________________<br>
From: Andrei Kutuzov <<a href="mailto:andreku@ifi.uio.no" target="_blank">andreku@ifi.uio.no</a>><br>
Sent: Monday, January 27, 2020 4:15 PM<br>
To: Stephan Oepen<br>
Cc: Antti Virtanen; Filip Ginter; infrastructure<br>
Subject: Re: rolling your own BERT (and maybe ELMo) on Saga<br>
<br>
No, I tried only multiple GPUs (up to 4) within the same node.<br>
<br>
27.01.2020 15:14, Stephan Oepen wrote:<br>
> across multiple nodes? oe<br>
><br>
><br>
> On Mon, 27 Jan 2020 at 15:07 Andrei Kutuzov <<a href="mailto:andreku@ifi.uio.no" target="_blank">andreku@ifi.uio.no</a><br>
> <mailto:<a href="mailto:andreku@ifi.uio.no" target="_blank">andreku@ifi.uio.no</a>>> wrote:<br>
><br>
> 27.01.2020 14:29, Stephan Oepen wrote:<br>
> >> Antti can tell about the exact GPU stuff needed. We will run the<br>
> tutorial on puhti since this is a tried and tested environment for<br>
> us, and we have little time to prepare, so we play it safe. But<br>
> Antti can tell what it takes to run the BERT code.<br>
> > yes, if possible, i could see myself try and replicate your software<br>
> > environment on Saga ... the multi-gpu part sounds like an interesting<br>
> > new challenge :-)!<br>
> Hi all,<br>
><br>
> Well, at least TensorFlow has no problems with multi-GPU training on<br>
> Saga, works more or less out of the box.<br>
><br>
><br>
> --<br>
> Andrei<br>
> PhD Candidate at Language Technology Group (LTG)<br>
> University of Oslo<br>
><br>
<br>
<br>
--<br>
Andrei<br>
PhD Candidate at Language Technology Group (LTG)<br>
University of Oslo<br>
</blockquote>
</div>
</div>
</div>
</div>
</body>
</html>