[NLPL Task Force (A)] trickle problem

Tue Aug 27 15:58:18 UTC 2019

Hi Stephan,

Yes, we're out of hours:

-bash-4.1$ sbatch --time=25:00:00 --mem-per-cpu=16G --account nn9447k 
--output=/usit/abel/u1/asayeed/rw-log/logfile%j-task999.txt echo.jobs 
/usit/abel/u1/asayeed/rw-eng-with-raw-sentences-and-heads-but-no-malt/ 
/usit/abel/u1/asayeed/rw-output/ 999
  ** Error: The specified project, nn9447k, is out of CPU hours. Using 
100.51% of grant.
     If you think this is incorrect, please contact us at 
hpc-drift at usit.uio.no

  Please read http://uio.no/hpc/abel/help/user-guide/
  and 'man sbatch' to learn more about submitting with the SLURM queue 
system
-bash-4.1$

That is what seems to be the problem here.

I need to finish 3500 units of something that takes a range of 8-25 
hours per unit, depending on the unit.  So far, about 2800 are 
complete.  Some are still running (so it doesn't appear that they get 
booted immediately...) So say 25h for a safety margin, and approx 700 
left, that is about 17500 hours with a minimal safety margin.

The cost function doesn't seem to have a manpage.

Yours,
--Asad.

On 2019-08-27 13:00, Stephan Oepen wrote:
> hi asad,
>
> there should be a log file from the trickle shell script in your home
> directory.  does that reveal any more information?  if not that, what
> happens if you take just one line from your job file and submit that
> interactively from the command line?
>
> looking at our current allocation, i wonder whether we may in fact
> just be out of hours (again) on Abel.  there are only four more weeks
> until the start of the new allocation period, but i can probably
> request a 'bonus' allocation for this period.  can you estimate how
> much more computing you expect to do before the end of the month?
>
> ps: to inform yourself about how your jobs are 'billed' against our
> allocation, take a look at the cost(1) command on Abel.
>
> best wishes, oe
>
>
> On Tue, Aug 27, 2019 at 11:31 AM Asad Basheer Sayeed <asad.sayeed at gu.se> wrote:
>> Hi,
>>
>> I'm getting sbatch failures from a running trickle (which Stephan showed
>> me to use), what might the problem be?
>>
>> [19-08-27 11:27:12] trickle[431]: 312 jobs; 240 running;trickle:
>> sbatch(1) failure; exit.
>>
>>    0 new.
>> [19-08-27 11:27:42] trickle[431]: 312 jobs; 240 running;trickle:
>> sbatch(1) failure; exit.
>>    0 new.
>> [19-08-27 11:28:13] trickle[431]: 312 jobs; 240 running;trickle:
>> sbatch(1) failure; exit.
>>    0 new.
>>
>> The command was:
>>
>> while true; do /projects/nlpl/operation/tools/trickle --limit 370
>> joblist6.txt ; sleep 30; done
>>
>> What's wrong with it? I pushed 2500 jobs earlier through it mostly
>> successfully.  I did increase the clock time to 25h because a handful of
>> my jobs were timing out.
>>
>> Yours,
>> --Asad.
>>
>>