[NLPL Task Force (A)] [uninett.no #198531] gpu utilization on Saga
nikolaiv@uio.no via RT
support at metacenter.no
Tue Nov 19 08:11:38 UTC 2019
Shall be OK now.
On Mon Nov 18 14:25:38 2019, oe at ifi.uio.no wrote:
dear colleagues,
some of our NLPL users point out that for the past several days it has
been very slow to see 'vanilla' single-gpu jobs scheduled on Saga.
just now, it appears that one user has effectively saturated the gpu
queue, but their jobs actually hardly seem to utilize the gpus
currently. please see the attached results of the following commands
squeue -p accel > /tmp/accel
for i in $(squeue -p accel | egrep 'c[0-9]-[0-9]$' | sort -u | awk
'{print $NF}'); do \
ssh $i nvidia-smi | grep Default; \
done > ~/nvidia-smi.log
i realize it is difficult to 'police' users, but in this specific case
i feel this colleague might benefit from some feedback on 'good' usage
patterns, and more generally i have been wondering whether the
scheduler could seek to maintain some fairness across users, i.e.
prohibit a single account from being granted the vast bulk of
available resources (while there are pending jobs by other users)?
with thanks in advance, oe
More information about the infrastructure
mailing list