[epe-users] availability of pre-processed variants of parser inputs

Thu Apr 13 23:21:35 CEST 2017

dear colleagues,

just a quick note to announce that we have just released yet another
version (1.2) of the EPE 2017 parser inputs.  the ‘raw’ text files
remain unchanged, but the new package also includes pre-processed
variants of all documents, in case some of you might find these
useful.  we have applied the tool chain for segmentation and
morphological analysis described by Velldal (2012; CL), which we
believe should provide quite decent sentence splitting, tokenization,
tagging, and lemmatization across genres and domains.  for details,
please see the updated README.txt in the new data archive:

  http://epe.nlpl.eu/index.php?page=4

in parallel, we continue to refine and generalize our downstream
systems and will much benefit from receiving additional trial
submissions, in particular for dependency representations that
transcend rooted trees!  please try to disregard any concerns you
might have for parsing accuracy, and for now just focus on getting our
inputs (raw or pre-processed) through your parsers, serialize parser
outputs in (or convert to) the EPE interchange format, and email us a
download link for the resulting set of files.  it will be important
that we exercise the mechanics of receiving submissions in diverse
representations quickly!

with thanks in advance, oe