[NLPL Task Force (A)] Saga usage

Fri Aug 21 16:04:47 UTC 2020

thanks for your quick reply, artur!  a better understanding of
individual usage patterns will certainly be helpful to us.  in
particular, i would be keen to learn more about the virtual
environment limitations on Puhti that make you run some (smallish, i
take it) part of your regular computing on Saga?  these are the kind
of unnecessary barriers to user mobility that we are hoping to
overcome through the NLPL collaboration ...

i am honestly not quite sure what exactly you are pushing back to?
probably not that Saga has been heavily loaded lately, that all users
are encouraged to be especially considerate currently, or the general
request to not 'over-saturate' the gpu queue with large numbers of
jobs while there are other users also competing for these resources.
we have suggested to the Saga administrators that scheduling fairness
could be improved by making it technically impossible for a single
user to run on an undue share of the limited gpu resources while there
are (many enough) pending jobs by other users.  but such a mechanism
has yet to be implemented, and hence we have to appeal to community
spirit among the NLPL users, which seem to be the by far most
prominent consumers of gpu time on Saga.  if you observe what to you
seems like 'resource hogging' by other users, please do contact us at
'infrastructure at nlpl.eu'.  this is not the first time we are reaching
out to individual users, and what other users may have done in the
past is of course not a good guide to best practises in general, and
even less so in the current situation.

so you are likely pushing back to the implied notion that you are
over-using Saga?    i contacted you and arra'di today after seeing
that the gpu queue was over-full and that between the two of you there
were quite a number of jobs running and pending, of which many were
showing hour-long running times.  regarding your ongoing experiment:
queuing 19 jobs with an expected running time of six hours corresponds
to requesting the full Saga gpu capacity for a little more than
three-and-a-half days.  assume that, on average, half the capacity
will be available to you during that period: putting all 19 jobs into
the queue at once effectively will cause jobs by other users, who
submit after you, to sit in the queue for one week.  to avoid this
effect, we have repeatedly asked individual users to steer clear of
this over-saturation effect in the gpu queue.

best wishes, og god helg!  oe