[NLPL Task Force (A)] [uninett.no #228071] ReqNodeNotAvail

Sabry Razick via RT support at metacenter.no
Sun Dec 6 11:08:52 UTC 2020


On Sun Dec 06 11:14:18 2020, vinitr at ifi.uio.no wrote:

    Sorry, yeah, I did mean Saga, job ID is 1624874. The maintenance notice says Saga’s going down on 7 Dec, I was hoping I could get some jobs to run over the weekend.

    – Vinit

    > On 6 Dec 2020, at 00:33, Sabry Razick via RT <support at metacenter.no> wrote:
    >
    > On Sat Dec 05 19:55:22 2020, vinitr at ifi.uio.no wrote:
    >
    >    Hi all,
    >
    >    Hope someone can help me out with this strange issue I’ve been having intermittently the last couple of days - my jobs have been stuck in queue with a (ReqNodeNotAvail, Reserved for maintenance) message. I had another user submit a job to make sure it wasn’t an issue with the system, and their jobs got through just fine. Is there something specific I should be doing? This is what my Slurm headers look like, although this also happens with partition=accel.
    >
    >    #SBATCH --account=nn9447k
    >    #SBATCH --partition=normal
    >    #SBATCH --nodes=1
    >    #SBATCH --time=100:00:00
    >    #SBATCH --mem-per-cpu=8G
    >
    >    Thanks!
    >
    >    – Vinit
    >
    >
    > Hello,
    > Not knowing which cluster (Fram, Betzy, Saga, ....? ) this is and the jobid I
    > can give you very little help on this. From the message my  guess is that you
    > are trying this on SAGA and did not notice the maintenance notice:
    >
    > https://opslog.sigma2.no/
    >
    > Regards,
    > Sabry
    >


Hello,
   The reason the job is held although you were planing to finish the job
during the weekend was that, SLURM was not informed of your intentions
correctly.

You asked SLURM for:
#SBATCH --time=100:00:00

Which is about 4 days of runtime. This means the job will not end before the
maintenance (7th December 08:00). If you ask (at the  moment of this reply)
less than --time=15:00:00 you will not face that error if there are free slots
(if other users were not quick to get the slots).

Regards,
Sabry




More information about the infrastructure mailing list