[NLPL Task Force (A)] partial mirror of OPUS to abel

Tiedemann, Jörg jorg.tiedemann at helsinki.fi
Thu Dec 20 16:09:07 UTC 2018


So, my subset of OPUS data now occupies 715GB on abel.
Let me know if that is OK - otherwise I can reduce by, for example, removing monolingual data files or the plain text bitexts that can be generated from the native XML versions.

All the best,
Jörg

********************************************************************************************
Jörg Tiedemann
Language Technology https://blogs.helsinki.fi/language-technology/
University of Helsinki

On 19 Dec 2018, at 15:43, Stephan Oepen <oe at ifi.uio.no<mailto:oe at ifi.uio.no>> wrote:

hi joerg,

our storage on Abel was extended to two terabytes this fall, and
currently we have some 800 gigabytes available.

i feel i (still) know too little about OPUS to say whether a partial
replica on Abel would be beneficial to NLPL users?  could you suggest
a sub-set (below 800 gigabytes) to mirror from Taito, and sketch a
typical use case?  could we sketch the reciple for a user to train
their OpenNMT-py system (more or less) straight from the OPUS
directory?

cheers, oe

On Wed, Dec 19, 2018 at 2:37 PM Tiedemann, Jörg
<jorg.tiedemann at helsinki.fi<mailto:jorg.tiedemann at helsinki.fi>> wrote:


This is especially for Stephan: One of the deliverables for this year in the OPUS activity is to create a partial mirror of OPUS data on abel. So far, I still don’t really know what we would like to make available and what kind of space we have for that on abel. In some sense, it could be enough to have that availability via the NIRD storage that you already fill with OPUS data, right? This also counts on longterm storage I guess. I also have the data in IDA here on CSC.

This is activity G1.4 and i wonder if I have to do something about it:
http://wiki.nlpl.eu/index.php/Infrastructure/home

All the best,
Jörg

********************************************************************************************
Jörg Tiedemann
Language Technology https://blogs.helsinki.fi/language-technology/
University of Helsinki


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlpl.eu/archives/infrastructure/attachments/20181220/1b44f3cf/attachment.htm>


More information about the infrastructure mailing list