[NLPL Task Force (A)] [rt.uio.no #3552285] gpu usage on Abel for teaching in october

Fri Aug 30 05:37:45 UTC 2019

> On 29. Aug 2019, at 21:47, Stephan Oepen via RT <hpc-drift at usit.uio.no> wrote:
> 
> 
> i
> am optimistically assuming that most users will have migrated to Saga
> by mid-september,

Nope. More likely “mid-december”.

> and that Abel remains operational until at least
> sometime into november.

Yes. More likely Abel will enter a new decade.

BUT, no guarantee that nn9447k is still a project on Abel and/or would have access to Saga & Abel. That depends on Sigma2.

> 
> do these assumptions sound plausible?  

See above ;)

> if need be, do we have
> mechanisms in place to prevent other Abel users from saturating the
> gpu queue for days into the future, or otherwise making sure that
> shorter, one-gpu jobs get scheduled inbetween?

Nah, then course users wouldn’t get the full cluster experience. Usually we don’t like to make such special arrangements particularly for such a long time.

> this challenge will
> likely also be relevant on Saga more or less from the beginning: at
> least during the trial period, andrey and vinit felt that at times it

Yeah, it was a pilot phase.

> was near-impossible to get gpu jobs running within a couple of days,
> because other users had put dozens of multi-gpu jobs into the queue.

I think, that was done on request by us, because GPUs were idling. But sure, I’d expect very long queues for GPUs. They are newer, more performant and more easy to use with the latest software packages.

> is there any principle of fairness across users built into the
> scheduling decisions, i.e. make it hard for a single user to run on an
> overwhelmingly large proportion of a specific partition while there
> are pending jobs (even if submitted more recently) by other users?

Currently, I think, there is no such policy in place. It might be possible to limit numbers of submitted/running jobs per account or per user for a partition. Since many projects will likely want to use the GPUs, a fair policy would probably need to implement limitations across projects, i.e., max 4 submitted/running jobs per project account.

Thomas

> 
> with thanks in advance, oe
>