InCaseLawBERT
Property | Value |
---|---|
Parameters | 110M |
License | MIT |
Paper | Pre-training Transformers on Indian Legal Text |
Training Data | 5.4M Indian legal documents |
What is InCaseLawBERT?
InCaseLawBERT is a specialized BERT model designed specifically for Indian legal text processing. Built upon the foundation of Legal-BERT, this model has been extensively trained on a massive corpus of 5.4 million Indian legal documents spanning from 1950 to 2019, encompassing various legal domains including Civil, Criminal, and Constitutional law.
Implementation Details
The model follows the bert-base-uncased architecture with 12 hidden layers, 768 hidden dimensions, and 12 attention heads. It was further trained for 300K steps using Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks on a 27GB corpus of legal text.
- Architecture: BERT-base configuration
- Training Corpus: 5.4M Indian legal documents
- Training Tasks: MLM and NSP
- Base Model: CaseLawBERT
Core Capabilities
- Legal Statute Identification
- Semantic Segmentation of Legal Documents
- Court Judgment Prediction
- Legal Text Embeddings Generation
Frequently Asked Questions
Q: What makes this model unique?
InCaseLawBERT is specifically optimized for Indian legal text processing, trained on one of the largest Indian legal corpora available. It maintains performance comparable to CaseLawBERT while being specifically adapted to Indian legal contexts.
Q: What are the recommended use cases?
The model excels in legal document analysis tasks including statute identification, semantic segmentation, and judgment prediction. It's particularly well-suited for applications involving Indian legal documents and can be fine-tuned for specific legal NLP tasks.