[NLPL Task Force (A)] Tool used in OpenSubtitles 2018

Humberto Castelo Branco hlcb91 at gmail.com
Mon Jan 28 10:11:59 UTC 2019


Hello, thanks for the reply, I'll take a look at the repository.

What I want now is a tool that I indicate the path of 2 subtitles in srt
format, where the first is the subtitle in English and the second is a
subtitle in another language and the tool finds the translation of each
English sentence in the second subtitle.

I believe this tool does this, or do I still need some other tool?

Em seg, 28 de jan de 2019 às 07:34, Tiedemann, Jörg <
jorg.tiedemann at helsinki.fi> escreveu:

>
> The subtitles come from https://www.opensubtitles.org and more about
> cleaning, converting and aligning them is published in various papers. Look
> at the references at http://opus.nlpl.eu. Our tools for aligning subtitle
> files are available here: https://github.com/Helsinki-NLP/subalign In
> addition to this we use language identification, some heuristics and
> language models for further filtering and cleaning the data set. Further
> feedback for improving the data sets is very welcome.
>
> Jörg
>
>
> ********************************************************************************************
> Jörg Tiedemann
> Language Technology https://blogs.helsinki.fi/language-technology/
> University of Helsinki
>
> On 27 Jan 2019, at 16:01, Humberto Castelo Branco <hlcb91 at gmail.com>
> wrote:
>
> Hello, good morning, my name is Humberto, I found the nlpl site through
> Google, more specifically the opus, from there I found the OpenSubtitles
> 2018 link containing several files and downloaded some of them, you used
> some tool to read the subtitles and extract the corresponding texts between
> languages? If so, is this tool available publicly? Congratulations on your
> work, it's great, wonderful, perfect.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlpl.eu/archives/infrastructure/attachments/20190128/4693d0ff/attachment.htm>


More information about the infrastructure mailing list