[NLPL Task Force (A)] Storage alternatives

Tiedemann, Jörg jorg.tiedemann at helsinki.fi
Tue Nov 17 10:16:06 UTC 2020


This includes 1300 translation models from us.
I guess you don’t want to include all of them in a module.

Jörg

*****************************************************************
Jörg Tiedemann
Language Technology                                                   https://blogs.helsinki.fi/language-technology/
University of Helsinki

On 17. Nov 2020, at 12.12, Andrey Kutuzov <andreku at ifi.uio.no<mailto:andreku at ifi.uio.no>> wrote:

It's only about 60 models that HugginFace itself provides
(https://huggingface.co/transformers/pretrained_models.html).

The list of community-uploaded modules (https://huggingface.co/models)
is of course much larger, but I don't think it makes sense to download
ALL of them.

17.11.2020 08:53, Stephan Oepen wrote:
i would be curious to know how much storage goes to the commonly used
subset of huggingface pre-trained models (and possibly other pre-trained
files)?  much like for the NLPL vectors repository, that is the kind of
data that should not be duplicated in user home directories, i.e. we
might want to devise an NLPL 'transformers' module with many pre-trained
models pre-installed.  is there a common subset of such models, or would
one be possibly be forced to just download everything that is available
through the huggingface hub?

oe



On Mon, Nov 16, 2020 at 2:17 PM Andrey Kutuzov <andreku at ifi.uio.no<mailto:andreku at ifi.uio.no>
<mailto:andreku at ifi.uio.no>> wrote:

Should we indeed schedule a meeting focused on the topic of storage? :)


On 16.11.2020 11:32, Vinit Ravishankar wrote:
Hi folks,

Have any of you figured out a way to store libraries that doesn’t
involve using Saga storage? I’ve cleared up most of my personal data but
my virtual environments and transformers cache add up to around 100 GiB.
Can’t do much with the transformers cache either, because the library
won’t auto-download temporarily if you’re running on GPU.

– Vinit



--
Andrey
PhD Candidate at Language Technology Group (LTG)
University of Oslo


--
Andrey
PhD Candidate at Language Technology Group (LTG)
University of Oslo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nlpl.eu/archives/infrastructure/attachments/20201117/6be53009/attachment.htm>


More information about the infrastructure mailing list