<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class=""><br class="">
</div>
<div class="">The subtitles come from <a href="https://www.opensubtitles.org" class="">https://www.opensubtitles.org</a> and more about cleaning, converting and aligning them is published in various papers. Look at the references at <a href="http://opus.nlpl.eu" class="">http://opus.nlpl.eu</a>.
Our tools for aligning subtitle files are available here: <a href="https://github.com/Helsinki-NLP/subalign" class="">https://github.com/Helsinki-NLP/subalign</a> In addition to this we use language identification, some heuristics and language models for further
filtering and cleaning the data set. Further feedback for improving the data sets is very welcome.</div>
<div class=""><br class="">
</div>
<div apple-content-edited="true" class="">
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="" style="orphans: 2; widows: 2; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
Jörg</div>
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<span class="" style="orphans: 2; widows: 2;"><br class="">
</span></div>
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<span class="" style="orphans: 2; widows: 2;">********************************************************************************************</span><br class="" style="orphans: 2; widows: 2;">
<span class="" style="orphans: 2; widows: 2;">Jörg Tiedemann</span></div>
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<span class="" style="orphans: 2; widows: 2;">Language Technology<span class="Apple-tab-span" style="white-space: pre;">
</span></span><a href="https://blogs.helsinki.fi/language-technology/" class="">https://blogs.helsinki.fi/language-technology/</a></div>
<div class=""><span style="orphans: 2; widows: 2;" class="">University of Helsinki</span></div>
</div>
</div>
</div>
</div>
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On 27 Jan 2019, at 16:01, Humberto Castelo Branco <<a href="mailto:hlcb91@gmail.com" class="">hlcb91@gmail.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="ltr" class="">
<div dir="ltr" class="">Hello, good morning, my name is Humberto, I found the nlpl site through Google, more specifically the opus, from there I found the OpenSubtitles 2018 link containing several files and downloaded some of them, you used some tool to read
the subtitles and extract the corresponding texts between languages? If so, is this tool available publicly? Congratulations on your work, it's great, wonderful, perfect.<br class="">
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</body>
</html>