[NLPL Task Force (A)] [NLPL Board] NLP in Tartu, Estonia

Sun Oct 21 18:58:05 UTC 2018

dear mark,

on behalf of the NLPL infrastructure task force (which is authorized
by the project steering group to make allocation decisions), i am
happy to welcome you and your team as NLPL associates.  to get
started, i recommend you and relevant team members request accounts on
both Abel and Taito following the procedure described here:

http://wiki.nlpl.eu/index.php/Infrastructure/resources

to keep things simple, i would ask that you only obtain accounts for
team members who are likely to actually make use of the NLPL
laboratory in the near future.  it will always be possible to add more
people down the road, as they will benefit from access :-).

regarding gpu availability and allocations, the situation is not ideal
on either Abel or Taito currently.  for the time being, we encourage
you to try using Abel first, where there are twenty K20s (and plenty
of cpu hours) available, not all that loaded in recent weeks.  a
significant expansion of gpu capacity is expected for the end of the
year.

the NLPL software infrastructure to date comprises PyTorch and
TensorFlow, but from a quick glance at MXNet, i would be tempted to
see whether we can provide a project-wide installation for you on Abel
(and, in principle, Taito too, of course).  can we assume that version
1.30 in a python 3.5 environment would work for you?

i trust you have already found the NLPL wiki (‘http://nlpl.eu’) where
there is emerging documentation on how to get started and a
‘catalogue’ of software and data installed under the NLPL umbrella.

for future queries, please do not hesitate to contact us as
‘infrastructure at nlpl.eu’.

best wishes, oe

On Sun, Oct 21, 2018 at 6:44 AM Mark Fishel <fishel at ut.ee> wrote:
>
> Stephan,
>
> I forgot to include some further details, described on the NLPL website.
>
> 1. expected types of computing
> Training and evaluating neural models for translation, text-to-speech.
>
> 2. software
> We use and write open source software. More specifically,
> - for NMT we use SockEye from AWSLabs, which uses mxnet as the low-level backend
> - for TTS and for all custom models we use either PyTorch or Tensorflow
> - our own software is at https://github.com/TartuNLP
>
> 3. data
> This includes several different datasets. For machine translation we use OPUS's corpora a lot (Europarl, OpenSubtitles, others). Our group has annotated and developed a large number of Estonian corpora, a lot of which can be accessed through https://metashare.ut.ee/, maintained by the Center of Estonian Language Resources.
>
> 4. anticipated group of users
> The affiliation for all the users from our side will be the NLP research group, institute of computer science, University of Tartu. The number of active users at the moment would be 8 people, which is bound to grow in th enear future, and of course we can scale this down if needed.
>
> Best wishes,
> Mark
>
> On Sun, Oct 21, 2018 at 7:32 AM Mark Fishel <fishel at ut.ee> wrote:
>>
>> Dear Stephan, everyone,
>>
>> we are definitely interested! Sorry for the huge delay in my answer.
>>
>> Our primary need is computational power, mainly servers with GPUs. We have an HPC at our university, but its GPU capacity is only enough for small-scale experiments and several groups use it besides us.
>>
>> The experiments where we would need this computational power is currently neural machine translation and end-to-end speech synthesis, I am hoping to expand it in the near future to dependency parsing and other topics.
>>
>> In addition to the computational resources another big appeal to me is the possibility of more easily exchanging data, experimental setup and results with partners from the NLPL network, we will gladly expand the collaboration with colleagues from Helsinki University and Uppsala, and of course will happily start new collaborations with other partners.
>>
>> Please let me know if anything else is needed from my side, I promise no more delays :-)
>>
>> Best wishes,
>> Mark
>>
>> On Tue, Oct 9, 2018 at 12:10 PM Stephan Oepen <oe at ifi.uio.no> wrote:
>>>
>>> hi again, mark,
>>>
>>> re-sending the message below, to make sure it actually made it to you?  i took it as all but a certainty that you were interested in joining the NLPL associate programme, but to initiate that process it would be good to have a brief summary of your needs, according to the guidelines for associate partners on the NLPL web site.
>>>
>>> best wishes, oe
>>>
>>>
>>> ---------- Forwarded message ---------
>>> From: Stephan Oepen <oe at ifi.uio.no>
>>> Date: Tue, Sep 11, 2018 at 5:36 PM
>>> Subject: Re: [NLPL Board] NLP in Tartu, Estonia
>>> To: Mark Fishel <fishel at ut.ee>
>>> Cc: Bjørn Lindi <bjorn.lindi at ntnu.no>, contact at nlpl.eu <contact at nlpl.eu>
>>>
>>>
>>> hi again, mark,
>>>
>>> as of this summer, NLPL has created an associate programme, to give
>>> additional compute-intensive NLP research groups in northern europe a
>>> way of taking advantage of our software and data installations.  the
>>> notion of associate partners was minted in response to your original
>>> query in april, so i hope you might still be interested?  if so,
>>> please see the instructions on the NLPL front page
>>> (‘http://www.nlpl.eu’) and send us an email about your anticipated
>>> needs.
>>>
>>> best wishes, oe
>>>
>>>
>>> On Fri, Apr 13, 2018 at 8:50 AM, Mark Fishel <fishel at ut.ee> wrote:
>>> > Dear Bjørn,
>>> >
>>> > thank you for the info! I will see about the Taito system, and if there are
>>> > any updates on outside groups joining, I would be happy to find out!
>>> >
>>> > Best wishes,
>>> > Mark
>>> >
>>> >
>>> > On Wed, Apr 11, 2018 at 9:53 AM Bjørn Lindi <bjorn.lindi at ntnu.no> wrote:
>>> >>
>>> >> Dear Mark,
>>> >> the NLPL have just started a discussion on how to open our resources to
>>> >> research groups outside NLPL. This is something we will work on in the
>>> >> coming months, though one way to get an immediate start is to get an
>>> >> personal account on the Finnish system taito. A personal account comes with
>>> >> a low compute quota  (I think it is around 10 000 CPU hours), but you will
>>> >> be able to browse NLPL resources and see what we currently provide.
>>> >>
>>> >> Be sides this practical step, we will be able to take you into account as
>>> >> we investigate how our resources could be shared to a greater benefit.
>>> >>
>>> >> Thanks for reaching out. I am sure we will find a way to collaborate.
>>> >>
>>> >> Yours Sincerely
>>> >> Bjørn Lindi
>>> >> NLPL Project Manager
>>> >>
>>> >>
>>> >>
>>> >> On 10 Apr 2018, at 19:03, Mark Fishel <fishel at ut.ee> wrote:
>>> >>
>>> >> Dear NLPL people,
>>> >>
>>> >> Estonia is currently looking for ways of closer ties with NEIC, and I was
>>> >> wondering if it is possible for the NLP group in Tartu to collaborate with
>>> >> NLPL, and what could the conditions for that be?
>>> >>
>>> >> The NLP group in Tartu (https://nlp.cs.ut.ee/) is working on data science
>>> >> applied to NLP, as well as linguistic resources like UD and other corpora.
>>> >> In particular this year we are organizing two sub-tracks in the translation
>>> >> shared task of WMT (unsupervised and multilingual NMT) and have participated
>>> >> in the tasks on translation, metrics and quality estimation.
>>> >>
>>> >> As far as infrastructure goes, we have a high performance computing center
>>> >> (http://hpc.ut.ee), but could use much more than what we currently have.
>>> >>
>>> >> So, I would love to talk to someone in NLPL and talk about possibilities /
>>> >> requirements / etc.
>>> >>
>>> >> Best wishes,
>>> >> Mark
>>> >>
>>> >>
>>> >