[NLPL Task Force (A)] [rt.uio.no #3552285] gpu usage on Abel for teaching in october
Sabry Razick via RT
hpc-drift at usit.uio.no
Thu Aug 29 20:27:49 UTC 2019
Hello,
On 2019-08-29 21:47:36, oe wrote:
> colleagues,
>
> under the NLPL umbrella, we are planning a series of lab assignments
> on neural machine translation (NMT) between late september and
> mid-november. there will likely be around ten student teams who each
> will want to run frequent multi-hour jobs on one gpu for those weeks.
> in principle, the Abel hardware would be fully sufficient for this
> purpose, but we would somehow need to make sure that a non-trivial
> fraction of the available gpu capacity will actually be available. i
> am optimistically assuming that most users will have migrated to Saga
> by mid-september, and that Abel remains operational until at least
> sometime into november.
If the migration is complete there is a possibility that Abel will
be shut down sooner. So I would recommend not to plan to use Abel after
October.
>
> do these assumptions sound plausible? if need be, do we have
> mechanisms in place to prevent other Abel users from saturating the
> gpu queue for days into the future, or otherwise making sure that
> shorter, one-gpu jobs get scheduled inbetween? this challenge will
> likely also be relevant on Saga more or less from the beginning: at
> least during the trial period, andrey and vinit felt that at times it
> was near-impossible to get gpu jobs running within a couple of days,
> because other users had put dozens of multi-gpu jobs into the queue.
> is there any principle of fairness across users built into the
> scheduling decisions, i.e. make it hard for a single user to run on an
> overwhelmingly large proportion of a specific partition while there
> are pending jobs (even if submitted more recently) by other users?
A reservation (only allows users of a certain project access for a time period)
could be arranged to facilitate what you require. However whether Abel will be
operational by then is the question. I am not sure how much we can influence Saga
queue (as Abel is UiO BHM can arrange this).
>
> with thanks in advance, oe
>
May I recommend to use one of the ML machines for this ? if that is a
possibility we can arrange a meeting with Thomas about this. If not I will forward
the request to Jon and Gard and find a solution.
Regards,
Sabry
More information about the infrastructure
mailing list