[NLPL Task Force (A)] trickle problem
Stephan Oepen
oe at ifi.uio.no
Thu Aug 29 19:33:42 UTC 2019
just to confirm, asad: all good on Abel again? oe
On Tue, Aug 27, 2019 at 5:59 PM Asad Sayeed <asad.sayeed at gu.se> wrote:
>
> Hi Stephan,
>
> Yes, we're out of hours:
>
> -bash-4.1$ sbatch --time=25:00:00 --mem-per-cpu=16G --account nn9447k
> --output=/usit/abel/u1/asayeed/rw-log/logfile%j-task999.txt echo.jobs
> /usit/abel/u1/asayeed/rw-eng-with-raw-sentences-and-heads-but-no-malt/
> /usit/abel/u1/asayeed/rw-output/ 999
> ** Error: The specified project, nn9447k, is out of CPU hours. Using
> 100.51% of grant.
> If you think this is incorrect, please contact us at
> hpc-drift at usit.uio.no
>
> Please read http://uio.no/hpc/abel/help/user-guide/
> and 'man sbatch' to learn more about submitting with the SLURM queue
> system
> -bash-4.1$
>
> That is what seems to be the problem here.
>
> I need to finish 3500 units of something that takes a range of 8-25
> hours per unit, depending on the unit. So far, about 2800 are
> complete. Some are still running (so it doesn't appear that they get
> booted immediately...) So say 25h for a safety margin, and approx 700
> left, that is about 17500 hours with a minimal safety margin.
>
> The cost function doesn't seem to have a manpage.
>
> Yours,
> --Asad.
>
> On 2019-08-27 13:00, Stephan Oepen wrote:
> > hi asad,
> >
> > there should be a log file from the trickle shell script in your home
> > directory. does that reveal any more information? if not that, what
> > happens if you take just one line from your job file and submit that
> > interactively from the command line?
> >
> > looking at our current allocation, i wonder whether we may in fact
> > just be out of hours (again) on Abel. there are only four more weeks
> > until the start of the new allocation period, but i can probably
> > request a 'bonus' allocation for this period. can you estimate how
> > much more computing you expect to do before the end of the month?
> >
> > ps: to inform yourself about how your jobs are 'billed' against our
> > allocation, take a look at the cost(1) command on Abel.
> >
> > best wishes, oe
> >
> >
> > On Tue, Aug 27, 2019 at 11:31 AM Asad Basheer Sayeed <asad.sayeed at gu.se> wrote:
> >> Hi,
> >>
> >> I'm getting sbatch failures from a running trickle (which Stephan showed
> >> me to use), what might the problem be?
> >>
> >> [19-08-27 11:27:12] trickle[431]: 312 jobs; 240 running;trickle:
> >> sbatch(1) failure; exit.
> >>
> >> 0 new.
> >> [19-08-27 11:27:42] trickle[431]: 312 jobs; 240 running;trickle:
> >> sbatch(1) failure; exit.
> >> 0 new.
> >> [19-08-27 11:28:13] trickle[431]: 312 jobs; 240 running;trickle:
> >> sbatch(1) failure; exit.
> >> 0 new.
> >>
> >> The command was:
> >>
> >> while true; do /projects/nlpl/operation/tools/trickle --limit 370
> >> joblist6.txt ; sleep 30; done
> >>
> >> What's wrong with it? I pushed 2500 jobs earlier through it mostly
> >> successfully. I did increase the clock time to 25h because a handful of
> >> my jobs were timing out.
> >>
> >> Yours,
> >> --Asad.
> >>
> >>
More information about the infrastructure
mailing list