nbailab-base-ner-scandi
Property | Value |
---|---|
Model Size | 676 MB |
Processing Speed | 4.16 samples/second |
Average F1 Score | 89.08% |
Author | saattrupdan |
Model Hub | Hugging Face |
What is nbailab-base-ner-scandi?
nbailab-base-ner-scandi is a sophisticated Named Entity Recognition (NER) model specifically designed for Scandinavian languages. Built on the NbAiLab/nb-bert-base architecture, it supports Danish, Norwegian (both Bokmål and Nynorsk), Swedish, Icelandic, and Faroese. The model identifies four types of entities: Person (PER), Location (LOC), Organization (ORG), and Miscellaneous (MISC).
Implementation Details
The model was trained using carefully selected hyperparameters, including a learning rate of 2e-05, batch size of 32, and Adam optimizer. It was fine-tuned on multiple datasets including DaNE, NorNE, SUC 3.0, and WikiANN, achieving state-of-the-art performance across all supported languages.
- Training conducted over 14 epochs with linear learning rate scheduling
- Implements gradient accumulation steps of 4
- Achieves superior performance while maintaining a smaller model size compared to alternatives
Core Capabilities
- Multi-language support across major Scandinavian languages
- High accuracy with 87.44% F1-score for Danish, 91.06% for Norwegian Bokmål, and 88.37% for Swedish
- Efficient processing at 4.16 samples per second
- Reasonable performance on English text due to cross-lingual training
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its comprehensive coverage of Scandinavian languages while maintaining state-of-the-art accuracy. It's significantly smaller (676MB) and faster than competitors like da_dacy_large_trf (2,090MB), making it more practical for production deployments.
Q: What are the recommended use cases?
The model is ideal for applications requiring named entity recognition in Scandinavian languages, such as information extraction, content analysis, and automated text processing systems. It's particularly effective for organizations working with multi-lingual Scandinavian content.