[NLPL Task Force (A)] Storage alternatives
Vinit Ravishankar
vinitr at ifi.uio.no
Wed Nov 18 12:47:31 UTC 2020
The space issues aren’t just the huggingface models (though those are obviously an issue too) - a single virtual environment is multiple gigabytes worth of libraries, where just the python3.7 binary is often ~5 gigabytes.
– Vinit
> On 17 Nov 2020, at 11:16, Tiedemann, Jörg <jorg.tiedemann at helsinki.fi> wrote:
>
>
> This includes 1300 translation models from us.
> I guess you don’t want to include all of them in a module.
>
> Jörg
>
> *****************************************************************
> Jörg Tiedemann
> Language Technology https://blogs.helsinki.fi/language-technology/
> University of Helsinki
>
>> On 17. Nov 2020, at 12.12, Andrey Kutuzov <andreku at ifi.uio.no> wrote:
>>
>> It's only about 60 models that HugginFace itself provides
>> (https://huggingface.co/transformers/pretrained_models.html).
>>
>> The list of community-uploaded modules (https://huggingface.co/models)
>> is of course much larger, but I don't think it makes sense to download
>> ALL of them.
>>
>> 17.11.2020 08:53, Stephan Oepen wrote:
>>> i would be curious to know how much storage goes to the commonly used
>>> subset of huggingface pre-trained models (and possibly other pre-trained
>>> files)? much like for the NLPL vectors repository, that is the kind of
>>> data that should not be duplicated in user home directories, i.e. we
>>> might want to devise an NLPL 'transformers' module with many pre-trained
>>> models pre-installed. is there a common subset of such models, or would
>>> one be possibly be forced to just download everything that is available
>>> through the huggingface hub?
>>>
>>> oe
>>>
>>>
>>>
>>> On Mon, Nov 16, 2020 at 2:17 PM Andrey Kutuzov <andreku at ifi.uio.no
>>> <mailto:andreku at ifi.uio.no>> wrote:
>>>>
>>>> Should we indeed schedule a meeting focused on the topic of storage? :)
>>>>
>>>>
>>>> On 16.11.2020 11:32, Vinit Ravishankar wrote:
>>>>> Hi folks,
>>>>>
>>>>> Have any of you figured out a way to store libraries that doesn’t
>>> involve using Saga storage? I’ve cleared up most of my personal data but
>>> my virtual environments and transformers cache add up to around 100 GiB.
>>> Can’t do much with the transformers cache either, because the library
>>> won’t auto-download temporarily if you’re running on GPU.
>>>>>
>>>>> – Vinit
>>>>>
>>>>
>>>>
>>>> --
>>>> Andrey
>>>> PhD Candidate at Language Technology Group (LTG)
>>>> University of Oslo
>>
>>
>> --
>> Andrey
>> PhD Candidate at Language Technology Group (LTG)
>> University of Oslo
>
More information about the infrastructure
mailing list