<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<meta content="text/html; charset=UTF-8">
<style type="text/css" style="">
<!--
p
{margin-top:0;
margin-bottom:0}
-->
</style>
<div dir="ltr">
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">
<p>Hi Stephan,</p>
<p><br>
</p>
<p>Thanks for your quick help. <br>
</p>
<p><br>
</p>
<p>Yes, I mean the model files generated by OpenNMT. Actually, the job has finished and the slurm log file has all the log information.
<br>
</p>
<p><br>
</p>
<p>I just tried to run the job on the log-node (cpu), the OpenNMT could generate models successfully. I think the problem happens when using GPUs. The log file should have a log like this:</p>
<p><br>
</p>
<p>"[2019-09-27 21:24:57,888 INFO] Saving checkpoint /usit/abel/u1/gtang/model_word_step_200.pt"<br>
</p>
<p><br>
</p>
<p>However, there is no such log in the slurm file. I guess that OpenNMT does not save models at all. (have no access?)</p>
<p><br>
</p>
<p>Best,</p>
<p>Gongbo<br>
</p>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Stephan Oepen <oe@ifi.uio.no><br>
<b>Sent:</b> Friday, September 27, 2019 10:21:29 PM<br>
<b>To:</b> Gongbo Tang<br>
<b>Cc:</b> hpc@usit.uio.no; infrastructure<br>
<b>Subject:</b> Re: [NLPL Task Force (A)] Jobs' output</font>
<div> </div>
</div>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">hi gonbo,<br>
<br>
i cannot say that i have used OpenNMT much myself, but more generally:<br>
unless i run something that is very i/o-intensive, i do not take the<br>
trouble of copying input and output data back and forth between the<br>
$SCRATCH filesystem, i.e. i doubt you need to worry about chkfile and<br>
friends. i would just work out of your home directory, i.e. read and<br>
write data there.<br>
<br>
the SLURM log file you sent does not look as if the job actually has<br>
completed? i assume by 'output files' you mean files generated during<br>
the OpenNMT run, i.e. the actual model file? i might guess that the<br>
model is only serialized to disk upon completion of the training, so<br>
could it be the case that your job actually had not gotten to that<br>
point?<br>
<br>
a general piece of advice: to debug it might help to reduce the<br>
problem to a tiny training file, possibly even something that can<br>
complete in a matter of a few minutes on a cpu node. that should<br>
allow you to find out where the output file(s) end up, and once you<br>
have a working set-up, you can submit larger jobs (to the gpu nodes).<br>
<br>
best wishes, oe<br>
<br>
On Fri, Sep 27, 2019 at 10:09 PM Gongbo Tang <gongbo.tang@lingfil.uu.se> wrote:<br>
><br>
> Hi,<br>
><br>
><br>
> I met a problem. I cannot find any output files/models after running a job. Or the job did not generate any models during running.<br>
><br>
><br>
> I am using Open-NMT 0.2.1, maintained by NLPL. I did not find any "Saving checkpoint ..." information from the log file which should be found. I attached the slurm file and the job script.<br>
><br>
><br>
> I tried to use "chkfile" or "cleanup" command to save the outputs, following the guide here (<a href="https://www.uio.no/english/services/it/research/hpc/abel/help/user-guide/job-scripts.html#Work_Directory">https://www.uio.no/english/services/it/research/hpc/abel/help/user-guide/job-scripts.html#Work_Directory</a>),
but I was told that "chkfile" and "cleanup" are not found.<br>
><br>
><br>
> I also tried to set the output directory as the home directory(~, /usit/abel/u1/gtang). I still got nothing.<br>
><br>
><br>
> Could you please tell me how can I get the job's outputs? Thanks a lot!<br>
><br>
><br>
> Best,<br>
><br>
> Gongbo<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här:
<a href="http://www.uu.se/om-uu/dataskydd-personuppgifter/">http://www.uu.se/om-uu/dataskydd-personuppgifter/</a><br>
><br>
> E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here:
<a href="http://www.uu.se/en/about-uu/data-protection-policy">http://www.uu.se/en/about-uu/data-protection-policy</a><br>
</div>
</span></font>
</body>
</html>