[NLPL Task Force (A)] [uninett.no #200768] very inefficient GPU utilization on Saga by one user
Anders Vaage via RT
support at metacenter.no
Fri Dec 20 19:53:03 UTC 2019
Hi Andrei,
Thanks for the feedback. First of all I will get in touch with the user and send him a warning, however, I probably won't hear back from him before over the weekend.
You're right that we need to monitor GPU usage more closely (which we are currently working on) and a strategy for ensuring fair usage. I will forward this through the appropriate channels.
Thanks,
Anders Vaage
On Fri Dec 20 18:46:55 2019, andreku at ifi.uio.no wrote:
> Hi,
>
> This issue has already been raised before, but the situation did not
> change.
>
> Right now, again, one particular user (josece,
> https://www.nhm.uio.no/english/about/organization/research-
> collections/people/josece/index.html)
> is occupying all Saga GPUs. There are now 24 active GPU jobs running
> under this user (some for several days already) and 9 more jobs
> pending.
>
> Even worse, it seems that these jobs do not actually make use of GPUs.
> The GPU utilization on all the nodes used by josece is 0% (in fact,
> the
> CPU utilization is not much higher). But this still effectively
> prohibits other users from getting access to Saga GPUs.
>
> I attach the list of GPU nodes queue (produced by squeue
> --partition=accel) and the overview of actual GPU usage by josece
> jobs.
> It was produced by running the following commands:
>
> for i in $(squeue -p accel | egrep 'c[0-9]-[0-9]$' | sort -u | awk
> '{print $NF}')
> do
> ssh $i nvidia-smi | grep Default
> done
>
> I do not think it is a good way of using the precious GPU resource of
> Saga.
>
> Can it be the case that josece jobs do not actually need GPUs and can
> be
> run on the CPU nodes with the same speed? Is it possible to give
> josece
> some feedback on that?
>
> Also, all Saga users will probably benefit if some limit on the
> amount
> of GPU jobs run by one user is imposed (at least when there are other
> pending GPU jobs from other users).
>
> Thanks in advance!
More information about the infrastructure
mailing list