[NLPL Task Force (A)] [uninett.no #198531] gpu utilization on Saga
oe@ifi.uio.no via RT
support at metacenter.no
Tue Nov 19 08:14:14 UTC 2019
thanks, nikolay! oe
On Tue, 19 Nov 2019 at 09:12 nikolaiv at uio.no via RT <support at metacenter.no>
wrote:
>
> Shall be OK now.
>
> On Mon Nov 18 14:25:38 2019, oe at ifi.uio.no wrote:
>
> dear colleagues,
>
> some of our NLPL users point out that for the past several days it has
> been very slow to see 'vanilla' single-gpu jobs scheduled on Saga.
>
> just now, it appears that one user has effectively saturated the gpu
> queue, but their jobs actually hardly seem to utilize the gpus
> currently. please see the attached results of the following commands
>
> squeue -p accel > /tmp/accel
> for i in $(squeue -p accel | egrep 'c[0-9]-[0-9]$' | sort -u | awk
> '{print $NF}'); do \
> ssh $i nvidia-smi | grep Default; \
> done > ~/nvidia-smi.log
>
> i realize it is difficult to 'police' users, but in this specific case
> i feel this colleague might benefit from some feedback on 'good' usage
> patterns, and more generally i have been wondering whether the
> scheduler could seek to maintain some fairness across users, i.e.
> prohibit a single account from being granted the vast bulk of
> available resources (while there are pending jobs by other users)?
>
> with thanks in advance, oe
>
>
>
>
>
More information about the infrastructure
mailing list