[NLPL Task Force (A)] Storage alternatives

Andrey Kutuzov andreku at ifi.uio.no
Tue Nov 17 10:12:12 UTC 2020


It's only about 60 models that HugginFace itself provides
(https://huggingface.co/transformers/pretrained_models.html).

The list of community-uploaded modules (https://huggingface.co/models)
is of course much larger, but I don't think it makes sense to download
ALL of them.

17.11.2020 08:53, Stephan Oepen wrote:
> i would be curious to know how much storage goes to the commonly used
> subset of huggingface pre-trained models (and possibly other pre-trained
> files)?  much like for the NLPL vectors repository, that is the kind of
> data that should not be duplicated in user home directories, i.e. we
> might want to devise an NLPL 'transformers' module with many pre-trained
> models pre-installed.  is there a common subset of such models, or would
> one be possibly be forced to just download everything that is available
> through the huggingface hub?
> 
> oe
> 
> 
> 
> On Mon, Nov 16, 2020 at 2:17 PM Andrey Kutuzov <andreku at ifi.uio.no
> <mailto:andreku at ifi.uio.no>> wrote:
>>
>> Should we indeed schedule a meeting focused on the topic of storage? :)
>>
>>
>> On 16.11.2020 11:32, Vinit Ravishankar wrote:
>> > Hi folks,
>> >
>> > Have any of you figured out a way to store libraries that doesn’t
> involve using Saga storage? I’ve cleared up most of my personal data but
> my virtual environments and transformers cache add up to around 100 GiB.
> Can’t do much with the transformers cache either, because the library
> won’t auto-download temporarily if you’re running on GPU.
>> >
>> > – Vinit
>> >
>>
>>
>> --
>> Andrey
>> PhD Candidate at Language Technology Group (LTG)
>> University of Oslo


-- 
Andrey
PhD Candidate at Language Technology Group (LTG)
University of Oslo



More information about the infrastructure mailing list