InCaseLawBERT

Maintained By
law-ai

InCaseLawBERT

PropertyValue
Parameters110M
LicenseMIT
PaperPre-training Transformers on Indian Legal Text
Training Data5.4M Indian legal documents

What is InCaseLawBERT?

InCaseLawBERT is a specialized BERT model designed specifically for Indian legal text processing. Built upon the foundation of Legal-BERT, this model has been extensively trained on a massive corpus of 5.4 million Indian legal documents spanning from 1950 to 2019, encompassing various legal domains including Civil, Criminal, and Constitutional law.

Implementation Details

The model follows the bert-base-uncased architecture with 12 hidden layers, 768 hidden dimensions, and 12 attention heads. It was further trained for 300K steps using Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks on a 27GB corpus of legal text.

  • Architecture: BERT-base configuration
  • Training Corpus: 5.4M Indian legal documents
  • Training Tasks: MLM and NSP
  • Base Model: CaseLawBERT

Core Capabilities

  • Legal Statute Identification
  • Semantic Segmentation of Legal Documents
  • Court Judgment Prediction
  • Legal Text Embeddings Generation

Frequently Asked Questions

Q: What makes this model unique?

InCaseLawBERT is specifically optimized for Indian legal text processing, trained on one of the largest Indian legal corpora available. It maintains performance comparable to CaseLawBERT while being specifically adapted to Indian legal contexts.

Q: What are the recommended use cases?

The model excels in legal document analysis tasks including statute identification, semantic segmentation, and judgment prediction. It's particularly well-suited for applications involving Indian legal documents and can be fine-tuned for specific legal NLP tasks.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.