<html><body><div style="font-family: arial, helvetica, sans-serif; font-size: 10pt; color: #000000"><div>Hi Filip,</div><div><br data-mce-bogus="1"></div><div>This sounds good to me, This raises some interesting infra questions (to me at least):</div><div><br></div><div>Could we compute a grand total hash that ensures that the whole thing is correctly in place (eg [1])?</div><div>Would we want that on a per-tar file basis (to be able to use only a partial corpus)?</div><div>And here I do not mean to hash the tar-file itself, but to make sure that the extracted tar is in place correctly.</div><div><br data-mce-bogus="1"></div><div>I am curious: Why did you not compress the tar files? To slow?</div><div><br data-mce-bogus="1"></div><div>Cheers,</div><div>Martin</div><div><br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>[1] https://stackoverflow.com/questions/4830089/how-to-checksum-an-entire-folder-structure<br data-mce-bogus="1"></div><div><br></div><div data-marker="__SIG_PRE__">-- <br>Martin Matthiesen<br>CSC - Tieteen tietotekniikan keskus<br>CSC - IT Center for Science<br>PL 405, 02101 Espoo, Finland<br>+358 9 457 2376, martin.matthiesen@csc.fi<br>Public key : https://pgp.mit.edu/pks/lookup?op=get&search=0x74B12876FD890704<br>Fingerprint: AA25 6F56 5C9A 8B42 009F BA70 74B1 2876 FD89 0704</div><br><hr id="zwchr" data-marker="__DIVIDER__"><div data-marker="__HEADERS__"><blockquote style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Filip Ginter" <ginter@cs.utu.fi><br><b>To: </b>"infrastructure" <infrastructure@nlpl.eu><br><b>Sent: </b>Thursday, 30 November, 2017 10:15:06<br><b>Subject: </b>[NLPL Task Force (A)] CoNLL-2017 raw data on taito<br></blockquote></div><div data-marker="__QUOTED_TEXT__"><blockquote style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div><div>Hi guys<br><br></div>Is it okay for me to stick this data <a href="https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1989" target="_blank">https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1989</a> to the nlpl directory on taito? We actually have this data in one of our researcher's work directory on taito, so the total space usage on taito stays. 522GB. Thiis is a useful dataset for parser training etc. <br><br></div>- Filip</div><br></blockquote></div></div></body></html>