[norlm] NorLM updates
Andrey Kutuzov
andreku at ifi.uio.no
Wed Feb 17 14:24:53 UTC 2021
Dear NorLM subscribers,
The Norwegian Large-scale Language Models (NorLM) project is here, with
two important updates about our models:
1) We have released version 1.1 of the NorBERT model. It fixes an issue
with duplicate entries in the model vocabulary. In some rare cases it
could lead to warnings and errors produced by existing code. The model
weights themselves are unchanged.
This version is assigned a new persistent identifier at the NLPL Vector
repository:
http://vectors.nlpl.eu/repository/20/216.zip
If you use NorBERT in any way in your work, we highly recommend to
download the new version. The previous release 1.0 is now deprecated.
NorBERT at the HuggingFace Model Hub is updated accordingly, you can use
it safely:
https://huggingface.co/ltgoslo/norbert
2) We have released two new ELMo models trained on the same Norwegian
corpus as the NorBERT:
- NorELMo30: 30 000 words in the target vocabulary
(http://vectors.nlpl.eu/repository/20/217.zip)
- NorELMo100: 100 000 words in the target vocabulary
(http://vectors.nlpl.eu/repository/20/218.zip)
At the http://wiki.nlpl.eu/Vectors/norlm/norelmo page you can find all
the necessary information about this models, including the evaluation
results.
In particular, ELMo outperforms all Norwegian BERT models in
fine-grained sentiment analysis. In other tasks, the results are
somewhat worse, but the time required to adapt the models to the task at
hand is orders of magnitude less than with BERT.
We encourage you to give NorELMo30 and NorELMo100 a try in your experiments.
As usual, http://norlm.nlpl.eu is the main source of information about
our NorLM models.
Please feel free to ask anything in this mailing list as well.
--
Andrey
Language Technology Group (LTG)
University of Oslo
More information about the norlm
mailing list