[NLPL Infrastructure] Gensim loading pretrained word vectors

Andrey Kutuzov andreku at ifi.uio.no
Fri Jan 29 11:42:02 UTC 2021


Dear Santosh,

The words in this model are lemmatized and augmented with their part of
speech tags (thus, "king_NOUN").
This information can be found in the metadata file inside the archive
(meta.json) or in the global metadata file for all models
(http://vectors.nlpl.eu/repository/latest.json).

On 29.01.2021 12:29, Stephan Oepen wrote:
> the NLPL infrastructure task force is the right contact address for
> questions of this kind (please see below).
> 
> best wishes, oe
> 
> On Fri, Jan 29, 2021 at 12:20 PM Santosh SRINIVAS
> <srinivas.hec at gmail.com> wrote:
>>
>> Dear NLPL Team, I am trying to use your pretrained Wiki[edia vectors. I used the following code mentioned on your page to load the vectors:
>>
>>     f= '3.zip'
>>     with zipfile.ZipFile(f, "r") as archive:
>>         stream= archive.open("model.txt")
>>         w2v= gensim.models.KeyedVectors.load_word2vec_format(stream, binary=False, unicode_errors='replace')
>>
>> It loads fine. But it returns an error for any word. For example, 'king' in w2v returns False.
>>
>> Not sure what I am doing wrong. Could you kindly help?
>>
>> Thanks and regards
>> Santosh


-- 
Andrey
Language Technology Group (LTG)
University of Oslo


More information about the infrastructure mailing list