[NLPL Task Force (A)] Content of NLPL Dependency Parsing's next milestone

Thu Nov 29 12:57:59 UTC 2018

many thanks for getting us well underway in this task, joakim!

for shorter directory names and URLs, what are the chances of me
convincing you to rename ‘universal[_ ]dependencies’ to just ‘ud’?  i
was planning to put the SDP data in a parallel folder, using just the
acronym.

given my over-developed sense of aesthetics, i would even volunteer to
make the necessary edits to the existing wiki pages :-).

as regards naming software, modules names should be something like
‘nlpl-uuparser’, but the corresponding installation directory should
just be just ‘/projects/nlpl/software/uuparser/’.  for all i recall,
sara was following that convention already, but just to make sure you
are aware of it.  additional guidelines are available here:

http://wiki.nlpl.eu/index.php/Infrastructure/installation/guide

oe
On Wed, Nov 28, 2018 at 5:01 PM Joakim Nivre <joakim.nivre at lingfil.uu.se> wrote:
>
> In the meantime, I updated the documentation for the parsing environment, including creating new subpages for different parsers and data sets. Have a look:
> http://wiki.nlpl.eu/index.php/Parsing/home
>
> Joakim
>
> On 28 Nov 2018, at 16:33, Joakim Nivre <joakim.nivre at lingfil.uu.se> wrote:
>
> Hi Stephan,
>
> I started looking into the data sets and I realize I have a very basic question. How can I get write-access at /projects/nlpl/data/parsing/universal_dependencies? Can you temporarily give me write access, or can I somehow log in as hpc-nlpl? Sorry for still being a rookie. :)
>
> Best,
> Joakim
>
> On 19 Nov 2018, at 14:23, Joakim Nivre <joakim.nivre at lingfil.uu.se> wrote:
>
> Thanks, Stephan. What seems to be emerging is the following plan with responsibilities:
>
> On Abel:
> - New version of UUParser (UU)
> - New version of UDPipe (UU)
> - UD v2.2 and v2.3 (UU)
> - B&N or CoreNLP (UiO)
> - D&M (UiO)
> - SDP subset (UiO)
>
> On Taito:
> - Mirror data sets (UiO)
> - Possibly UDPipe (?)
>
> Documentation:
> - Main parsing page (UU)
> - Subpage for UUParser (UU)
> - Subpage for UDPipe (UU)
> - Subpage for B&N or CoreNLP (UiO)
> - Subpage for D&M (UiO)
>
> Do you agree?
>
> Joakim
>
> On 18 Nov 2018, at 23:06, Stephan Oepen <oe at ifi.uio.no> wrote:
>
> hi joakim and miryam,
>
> thanks for making contact about this thread!
>
> + REPP tokenizer with PTB- and UD-compatible rules for english
> + two ‘baseline’ parsers with pre-trained english models (e.g.
> MaltParser, B&N, or CoreNLP)
> + the open-source sub-set of the Semantic Dependency Parsing (SDP) data
> + one graph parser and SDP models, e.g. TurboSemanticParser or Dozat & Manning
>
>
> at this point, and especially if you put in place the new UDPipe, i am
> inclined to limit ourselves to just one baseline parser, e.g. B&N or
> CoreNLP.  the former would be my preference, in part because it will
> be more of a ‘baseline’.  would you agree?
>
> as for the graph parser, dozat and manning (2018) now seems the tool
> of choice, though i have no idea about how hard or easy it will be to
> install.  i will take a look at that one first (i have installed
> TurboSemanticParser in the past, so that should be a suitable
> fall-back).
>
> as for documentation, i have started a somewhat templatic collection
> of pages, one per tool, on the NLPL wiki, e.g.
>
> http://wiki.nlpl.eu/index.php/Infrastructure/software/tensorflow
> http://wiki.nlpl.eu/index.php/Translation/opennmt-py
>
> may i suggest we do something similar, i.e. create separate pages for
> each parser:
>
> http://wiki.nlpl.eu/index.php/Parsing/udpipe
> http://wiki.nlpl.eu/index.php/Parsing/uuparser
> ...
>
> assuming a scheme like this, the top-level ‘Parsing’ page could then
> explain in broad terms the range of available parsing systems (e.g.
> into trees vs. into graphs; for english vs. for many languages;
> ‘vintage’ vs. neural) and corresponding data sets.
>
> finally, i think it would be good to have at least the training data
> sets replicated on Taito, and i can take care of that (the
> infrastructure task force has developed a home-grown replication
> scheme).  and time-permitting, we can then look into duplicating the
> installation of additional tools from our parsing task force, for
> example start with UDPipe.  from working on TensorFlow, PyTorch,
> OpenNMT-py, et al. earlier this fall, once i had written down the
> recipe for Abel it typically was not a lot of extra effort to go
> through mostly the same sequence of steps on Taito—but of course there
> remain subtle differences to be aware of.
>
> best wishes, oe
>
>
> On Wed, Nov 14, 2018 at 3:18 PM Joakim Nivre <joakim.nivre at lingfil.uu.se> wrote:
>
>
> Hi Stephan,
>
> In March this year, you proposed a number of contributions to the M24 deliverable in Activity C from the Oslo team (see below). Is this proposal still valid? We are seriously understaffed in Uppsala for the moment, with Aaron having left and Sara being on parental leave, so we have to be somewhat less ambitious on the Uppsala side than we were originally hoping. As a minimal update, we propose the following:
>
> + New version of UUParser
> + New version of UDPipe (which is in fact a completely different system)
> + New UD releases (v2.2 and v2.3)
>
> In addition, we have to collaborate on updating the documentation at http://wiki.nlpl.eu/index.php/Parsing/home to reflect the new contributions from both sites.
>
> Finally, in your March email, you proposed to make the mirroring of all this on Taito a priority, but I don’t know if this is a realistic goal for the M24 milestone anymore. What do you think?
>
> Best,
> Joakim
>
> On 31 Mar 2018, at 14:26, Stephan Oepen <oe at ifi.uio.no> wrote:
>
> dear sara and all,
>
> regarding the parsing task, i would volunteer the following
> contributions from oslo:
>
> + REPP tokenizer with PTB- and UD-compatible rules for english
> + two ‘baseline’ parsers with pre-trained english models (e.g.
> MaltParser, B&N, or CoreNLP)
> + the open-source sub-set of the Semantic Dependency Parsing (SDP) data
> + one graph parser and SDP models, e.g. TurboSemanticParser or Dozat & Manning
>
> all of the above, i suggest to include in the M24 deliverable.
>
> regarding availability of the parsing infrastucture, i am tempted to
> suggest that we make replication of both data and software for this
> task on both Abel and Taito a priority now.  the ability to just
> invoke a state-of-the-art parser with pre-trained models, in my view,
> is something fundamental to many NLP projects, likely also for MSc
> students and doctoral fellows.  with maybe half a dozen software
> systems installed as part of this task by M24, making sure that
> everything is installed on both systems should be feasible, i hope?
>
> best wishes, oe
>
>
> On Wed, Mar 14, 2018 at 2:34 PM, Sara Stymne <sara.stymne at lingfil.uu.se> wrote:
>
> Hi Bjørn,
>
>
> Below is the plan from Uppsala for the next parsing milestones. I cc Stephan
> and Filip, if they have something to add. Specifically we expect Oslo to
> take care about the work on graph parsing.
>
>
> Best,
>
> Sara
>
>
>
> Data
>
> We will provide the universal dependencies data, release 2.2 and 2.3
> (provided that it is released as planned).
>
>
> Data for semantic (graph) parsing. (either in 2018 or 2019, not yet decided;
> Oslo)
>
>
> Parsers
>
> We will upgrade existing parsers
>
> UUparser
> UD-Pipe (current version installed by Opus activity, it is open who will
> upgrade)
>
>
> We will install additional parsers
>
> A state-of-the-art graph-based dependency parser (possibly Stanford, but we
> will await the CoNLL 2018 results before deciding)
> At least one non-neural baseline parser (probably TurboParser, MateTools, or
> MaltParser)
> A semantic parser (graph parser) (either in 2018 or 2019, not yet decided;
> Oslo)
> (In the original plan we  said that we would install SyntaxNet. However, we
> do no longer think that is a good choice, and will insted choose another
> state-of-the-art parser, as specified above)
>
>
> Tutorial
>
> We will update the tutorial, and make sure it covers all available tools.
>
>
> Currently the parsing tools are available only on Abel. Depending on
> priorities and how the work in other packages go, we will install them also
> on taito either in 2018 or 2019.
>
>
>
>
>
>
>
>
>
>
> När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
>
> E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
>
>
>
>