[NLPL Task Force (A)] [uninett.no #199370] Accel Queue

thierry.toutain via RT support at metacenter.no
Tue Dec 3 10:11:58 UTC 2019


Hello Jeremy,
we will check with the user how his jobs do use gpus,
   Thierry



On Mon Dec 02 16:53:57 2019, jeremycb at ifi.uio.no wrote:


    Hello,

     

    I wanted to point out that over the last few days it has been very
    difficult to schedule an accel job on Saga. There seems to be a single
    user that has effectively saturated the gpu queue, but their jobs don't
    use the gpus effectively:

     

     

    squeue -p accel > /tmp/accel
    for i in $(squeue -p accel | egrep 'c[0-9]-[0-9]$' | sort -u | awk
    '{print $NF}'); do \
      ssh $i nvidia-smi | grep Default; \
    done > ~/nvidia-smi.log

     

    Would it be possible to let the user know that this situation is
    suboptimal?

     

    Thanks,

     

    Jeremy Barnes

    Language Technology Group

    University of Oslo

    Office 7645

    jeremycb at ifi.uio.no





More information about the infrastructure mailing list