[NLPL Task Force (A)] big array job
Asad Sayeed
asad.sayeed at gu.se
Sat Feb 23 16:31:46 UTC 2019
Hi,
So this worked. Thanks a bunch. However, now I need to run it again,
but with another tool in the pipeline, spacy. But the catch is, I also
need it in py2.7, because of the pipeline. Is this easily doable?
Thanks again.
Yours,
--Asad.
On 2019-02-16 03:06 PM, Stephan Oepen wrote:
> hi asad,
>
> i am glad you are about to put some load on the system :-). as long
> as you stick within the published limit of a maximum of 400 jobs (and
> your typical job behaves itself, e.g. does not put undue strain on the
> file system), i see no reason why you should be overly careful. i
> would recommend you keep an eye on your jobs, at least in the
> beginning, and monitor your mailbox ... in case the system
> administrators find something to remark.
>
> are all of these jobs single-threaded? i am no big fan of the Abel
> arrayrun(1) facility. what i usually do is create a large file with
> as many command lines as i want to run jobs, each with whatever
> parameters that job requires. a silly example of such a master job
> file could be something like
>
> for i in 0 1 2 3 4 5 6 7 8 9; do
> for j in 0 1 2 3 4 5 6 7 8 9; do
> echo "sbatch ${HOME}/echo.slurm ${i} ${j}";
> done;
> done > ~/echo.jobs
>
> assuming such a file, i have a script that ‘trickles’ through the
> sequence of jobs, keeping up to some maximum limit of queue entries at
> any point in time, and filling up the queue to the limit again as jobs
> terminate. my idiom of setting into motion this process then goes as
> follows:
>
> /projects/nlpl/operation/tools/trickle --start --limit 20 ~/echo.jobs
> while true; do /projects/nlpl/operation/tools/trickle --limit 20
> ~/echo.jobs ; sleep 30; done
> [19-02-16 15:00:37] trickle[20]: 20 jobs; 3 running; 0 new.
> [19-02-16 15:01:07] trickle[20]: 17 jobs; 0 running; 3 new.
> [19-02-16 15:01:38] trickle[23]: 17 jobs; 0 running; 3 new.
> [19-02-16 15:02:10] trickle[26]: 20 jobs; 3 running; 0 new.
>
> the first integer is the pointer into the job sequence, 20 initially,
> then at each step advancing by the number of new jobs submitted for
> that call.
>
> —just in case you might find this useful ... for all i know, this
> script provides similar functionality to arrayrun(1), but i find it
> more convenient to be able to pass each job its full command line
> directly, without having to redirect on the job indices under
> arrayrun(1) control.
>
> i will be curious to know how these jobs turn our for you :-)! oe
>
> On Sat, Feb 16, 2019 at 1:42 PM Asad Sayeed <asad.sayeed at gu.se> wrote:
>> Hi,
>>
>> The abel documentation says users are allowed to run up to 400 jobs
>> simultaneously. If I run arrayrun 4x on different segments of the
>> corpus, will I get myself into trouble with the authorities or
>> something? 400 at a time is a significant time saving for me, obviously
>> (2.5 days for the whole thing).
>>
>> Thanks.
>>
>> Yours,
>> --Asad.
>>
>>
>> On 2019-02-16 01:29 PM, Asad Sayeed wrote:
>>> Hi Stephan,
>>>
>>> I am now trying to scale up my SRL task "for real" over 70M sentences,
>>> divided up into 3500 segments/tasks, each taking about 12G memory each
>>> and taking about 7 hours. I am trying to use arrayrun on abel on my
>>> script. However, it seems like arrayrun will only activate 100 jobs
>>> at a time. This will take 10 days to run the entire job, which is
>>> slower than the much smaller cluster I was running it on elsewhere
>>> (where I can run about 300 at a time and take 14 hours, for about 7
>>> days). I was hoping to gain a signficant turnaround time for
>>> experimentation on abel. Is there any way to get more on abel or is
>>> that a hard limit?
>>>
>>> Thanks.
>>>
>>> Yours,
>>> --Asad.
>>>
More information about the infrastructure
mailing list