[NLPL Task Force (A)] CoNLL-2017 raw data on taito
Martin Matthiesen
martin.matthiesen at csc.fi
Thu Nov 30 16:24:46 UTC 2017
Hi Filip,
This sounds good to me, This raises some interesting infra questions (to me at least):
Could we compute a grand total hash that ensures that the whole thing is correctly in place (eg [1])?
Would we want that on a per-tar file basis (to be able to use only a partial corpus)?
And here I do not mean to hash the tar-file itself, but to make sure that the extracted tar is in place correctly.
I am curious: Why did you not compress the tar files? To slow?
Cheers,
Martin
[1] https://stackoverflow.com/questions/4830089/how-to-checksum-an-entire-folder-structure
--
Martin Matthiesen
CSC - Tieteen tietotekniikan keskus
CSC - IT Center for Science
PL 405, 02101 Espoo, Finland
+358 9 457 2376, martin.matthiesen at csc.fi
Public key : https://pgp.mit.edu/pks/lookup?op=get&search=0x74B12876FD890704
Fingerprint: AA25 6F56 5C9A 8B42 009F BA70 74B1 2876 FD89 0704
> From: "Filip Ginter" <ginter at cs.utu.fi>
> To: "infrastructure" <infrastructure at nlpl.eu>
> Sent: Thursday, 30 November, 2017 10:15:06
> Subject: [NLPL Task Force (A)] CoNLL-2017 raw data on taito
> Hi guys
> Is it okay for me to stick this data [
> https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1989 |
> https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1989 ] to the nlpl
> directory on taito? We actually have this data in one of our researcher's work
> directory on taito, so the total space usage on taito stays. 522GB. Thiis is a
> useful dataset for parser training etc.
> - Filip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlpl.eu/archives/infrastructure/attachments/20171130/3c4fd195/attachment.htm>
More information about the infrastructure
mailing list