IndicNER

IndicNER

ai4bharat

A multilingual NER model covering 11 Indian languages, trained on Samanantar corpus using BERT architecture. MIT licensed with 478K+ downloads.

PropertyValue
LicenseMIT
PaperView Paper
Downloads478,117
Languages Supported11 Indian Languages

What is IndicNER?

IndicNER is a sophisticated Named Entity Recognition (NER) model specifically designed for Indian languages. Built on the bert-base-multilingual-uncased architecture, it has been fine-tuned on the Naamapadam dataset derived from the Samanantar Corpus to identify named entities in 11 different Indian languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu.

Implementation Details

The model leverages transformer architecture and PyTorch framework for implementation. It was trained on millions of sentences from the Samanantar corpus and has been benchmarked against human-annotated test sets and various public Indian NER datasets.

  • Base Architecture: BERT Multilingual Uncased
  • Training Dataset: Naamapadam (derived from Samanantar)
  • Framework: PyTorch
  • Evaluation: Human-annotated testsets

Core Capabilities

  • Multi-language NER processing across 11 Indian languages
  • Token classification for named entity identification
  • Efficient processing of large-scale text data
  • Integration capability with existing NLP pipelines

Frequently Asked Questions

Q: What makes this model unique?

IndicNER stands out for its comprehensive coverage of Indian languages and its training on the extensive Samanantar corpus, making it particularly effective for Indian language NER tasks. The model's architecture and training approach make it well-suited for production environments requiring multilingual NER capabilities.

Q: What are the recommended use cases?

The model is ideal for applications requiring named entity recognition in Indian languages, such as information extraction, text analysis, and content classification. It's particularly useful for organizations working with multilingual Indian content and requiring robust NER capabilities.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026