[NLPL Task Force (A)] [uninett.no #228071] ReqNodeNotAvail

Vinit Ravishankar via RT support at metacenter.no
Sun Dec 6 16:39:01 UTC 2020


Ah that makes sense, thanks!

– Vinit

> On 6 Dec 2020, at 12:08, Sabry Razick via RT <support at metacenter.no> wrote:
> 
> On Sun Dec 06 11:14:18 2020, vinitr at ifi.uio.no wrote:
> 
>    Sorry, yeah, I did mean Saga, job ID is 1624874. The maintenance notice says Saga’s going down on 7 Dec, I was hoping I could get some jobs to run over the weekend.
> 
>    – Vinit
> 
>> On 6 Dec 2020, at 00:33, Sabry Razick via RT <support at metacenter.no> wrote:
>> 
>> On Sat Dec 05 19:55:22 2020, vinitr at ifi.uio.no wrote:
>> 
>>   Hi all,
>> 
>>   Hope someone can help me out with this strange issue I’ve been having intermittently the last couple of days - my jobs have been stuck in queue with a (ReqNodeNotAvail, Reserved for maintenance) message. I had another user submit a job to make sure it wasn’t an issue with the system, and their jobs got through just fine. Is there something specific I should be doing? This is what my Slurm headers look like, although this also happens with partition=accel.
>> 
>>   #SBATCH --account=nn9447k
>>   #SBATCH --partition=normal
>>   #SBATCH --nodes=1
>>   #SBATCH --time=100:00:00
>>   #SBATCH --mem-per-cpu=8G
>> 
>>   Thanks!
>> 
>>   – Vinit
>> 
>> 
>> Hello,
>> Not knowing which cluster (Fram, Betzy, Saga, ....? ) this is and the jobid I
>> can give you very little help on this. From the message my  guess is that you
>> are trying this on SAGA and did not notice the maintenance notice:
>> 
>> https://opslog.sigma2.no/
>> 
>> Regards,
>> Sabry
>> 
> 
> 
> Hello,
>   The reason the job is held although you were planing to finish the job
> during the weekend was that, SLURM was not informed of your intentions
> correctly.
> 
> You asked SLURM for:
> #SBATCH --time=100:00:00
> 
> Which is about 4 days of runtime. This means the job will not end before the
> maintenance (7th December 08:00). If you ask (at the  moment of this reply)
> less than --time=15:00:00 you will not face that error if there are free slots
> (if other users were not quick to get the slots).
> 
> Regards,
> Sabry
> 





More information about the infrastructure mailing list