xlm-roberta-base-finetuned-nace

erst

A multilingual NACE code classifier based on XLM-RoBERTa, fine-tuned on 2.5M business descriptions in multiple European languages for accurate activity classification.

Property	Value
Base Model	XLM-RoBERTa Base
Task	NACE Code Classification
License	MIT
Author	erst

What is xlm-roberta-base-finetuned-nace?

This is a specialized model fine-tuned on XLM-RoBERTa base architecture for classifying business activity descriptions into standardized NACE Rev. 2 codes. The model leverages a substantial dataset of 2.5 million descriptions from Norwegian and Danish businesses, enhanced with machine translations into multiple European languages.

Implementation Details

The model utilizes the transformers library and can be easily implemented using the pipeline architecture for sentiment analysis tasks. It's designed to process text descriptions in multiple languages including English, German, Spanish, French, Finnish, and Polish.

Built on XLM-RoBERTa base architecture
Fine-tuned on multilingual business descriptions
Supports multiple European languages through translation-augmented training
Implements standard transformers pipeline interface

Core Capabilities

Multilingual NACE code classification
Processing of business activity descriptions
Cross-lingual understanding and classification
Standardized business activity categorization

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its specialized training on a large dataset of business descriptions and its ability to handle multiple European languages through translation-augmented training data, making it particularly effective for NACE code classification tasks.

Q: What are the recommended use cases?

The model is ideal for automated business activity classification, regulatory compliance, economic research, and standardization of business descriptions across different European languages into NACE Rev. 2 codes.