[epe-users] updates on EPE task infrastructure

Mon Apr 10 00:06:34 CEST 2017

dear colleagues,

welcome to the mailing list for the EPE 2017 shared task at DepLing
and IWPT 2017!  at present, there are five subscribed teams, and we
know of another three colleagues who are hoping to make the time for
participation too.  thus, we are optimistic there will be sufficient
critical mass to complete this first installment of cross-framework
and cross-parser extrinsic evaluation.  after all, our initial goal in
this exercise is to collaborate with parser developers on building a
re-usable and methodologically sound infrastructure.

we have just posted a few updates to the task web site
(‘http://epe.nlpl.eu’), including a revision to the schedule for trial
submissions and an announcement of a first package for format
conversion to the EPE interchange format.  as regards the latter, our
converter can attempt to recover substring ranges against the
underlying text for purely token-oriented parser output formats (like
CoNLL-X or SDP).  however, many tokenizers also normalize the input
text to some degree (e.g. converting from ASCII to Unicode punctuation
marks, or vice versa), and hence the alignment of tokens against the
original text can be fuzzy.

in general, we believe it will be preferable for parsers to keep track
of character ranges internally (no doubt many do anyway) and to
serialize parsing results directly in the EPE interchange format.
formally, we will expect all system submissions to the task in the EPE
format.  therefore, if you decide to incorporate our converters into
your parsing pipeline please also accept responsibility for any errors
it might introduce in your parser outputs.

we have been busy generalizing downstream systems over the past few
weeks by simulating system submissions ourselves, notably by using
CoreNLP, TurboSemanticParser, and UDPipe.  however, we feel it is high
time we move to more ‘realistic’ submissions—viz. those provided to us
by actual parser developers—as well as to a broader variety of
dependency representations.  in particular, we will be grateful to
receive dependency graphs that transcend ‘classic’ syntactic
dependency trees.

in case you have started already to output EPE files from your parser
(one way or another), please feel free to (a) parse all the ‘.txt’
files in the ‘training/’ and ‘development/’ sub-directories of our
most recent ‘raw’ text release (version 1.1); (b) put parser outputs
into a parallel directory structure, using parallel file names but
replacing the ‘.txt’ suffix with ‘.epe’; and (c) package all the EPE
files up into a compressed archive and email us a download link.  we
look forward to working with you on running your parser outputs
through our downstream applications and expect useful feedback from
this exercise to all parties.

with thanks in advance, oe