[NLPL Task Force (A)] trickle problem

Fri Aug 30 09:50:17 UTC 2019

So far, so good! Thanks! It turned out that they let some of the jobs 
run to completion from the last round, so I only had about 400 left, 
some of which are still running, however.

Yours,
--Asad.

On 2019-08-29 21:33, Stephan Oepen wrote:
> just to confirm, asad: all good on Abel again?  oe
>
> On Tue, Aug 27, 2019 at 5:59 PM Asad Sayeed <asad.sayeed at gu.se> wrote:
>> Hi Stephan,
>>
>> Yes, we're out of hours:
>>
>> -bash-4.1$ sbatch --time=25:00:00 --mem-per-cpu=16G --account nn9447k
>> --output=/usit/abel/u1/asayeed/rw-log/logfile%j-task999.txt echo.jobs
>> /usit/abel/u1/asayeed/rw-eng-with-raw-sentences-and-heads-but-no-malt/
>> /usit/abel/u1/asayeed/rw-output/ 999
>>    ** Error: The specified project, nn9447k, is out of CPU hours. Using
>> 100.51% of grant.
>>       If you think this is incorrect, please contact us at
>> hpc-drift at usit.uio.no
>>
>>    Please read http://uio.no/hpc/abel/help/user-guide/
>>    and 'man sbatch' to learn more about submitting with the SLURM queue
>> system
>> -bash-4.1$
>>
>> That is what seems to be the problem here.
>>
>> I need to finish 3500 units of something that takes a range of 8-25
>> hours per unit, depending on the unit.  So far, about 2800 are
>> complete.  Some are still running (so it doesn't appear that they get
>> booted immediately...) So say 25h for a safety margin, and approx 700
>> left, that is about 17500 hours with a minimal safety margin.
>>
>> The cost function doesn't seem to have a manpage.
>>
>> Yours,
>> --Asad.
>>
>> On 2019-08-27 13:00, Stephan Oepen wrote:
>>> hi asad,
>>>
>>> there should be a log file from the trickle shell script in your home
>>> directory.  does that reveal any more information?  if not that, what
>>> happens if you take just one line from your job file and submit that
>>> interactively from the command line?
>>>
>>> looking at our current allocation, i wonder whether we may in fact
>>> just be out of hours (again) on Abel.  there are only four more weeks
>>> until the start of the new allocation period, but i can probably
>>> request a 'bonus' allocation for this period.  can you estimate how
>>> much more computing you expect to do before the end of the month?
>>>
>>> ps: to inform yourself about how your jobs are 'billed' against our
>>> allocation, take a look at the cost(1) command on Abel.
>>>
>>> best wishes, oe
>>>
>>>
>>> On Tue, Aug 27, 2019 at 11:31 AM Asad Basheer Sayeed <asad.sayeed at gu.se> wrote:
>>>> Hi,
>>>>
>>>> I'm getting sbatch failures from a running trickle (which Stephan showed
>>>> me to use), what might the problem be?
>>>>
>>>> [19-08-27 11:27:12] trickle[431]: 312 jobs; 240 running;trickle:
>>>> sbatch(1) failure; exit.
>>>>
>>>>     0 new.
>>>> [19-08-27 11:27:42] trickle[431]: 312 jobs; 240 running;trickle:
>>>> sbatch(1) failure; exit.
>>>>     0 new.
>>>> [19-08-27 11:28:13] trickle[431]: 312 jobs; 240 running;trickle:
>>>> sbatch(1) failure; exit.
>>>>     0 new.
>>>>
>>>> The command was:
>>>>
>>>> while true; do /projects/nlpl/operation/tools/trickle --limit 370
>>>> joblist6.txt ; sleep 30; done
>>>>
>>>> What's wrong with it? I pushed 2500 jobs earlier through it mostly
>>>> successfully.  I did increase the clock time to 25h because a handful of
>>>> my jobs were timing out.
>>>>
>>>> Yours,
>>>> --Asad.
>>>>
>>>>