InLegalBERT
Property | Value |
---|---|
Parameters | 110M |
License | MIT |
Language | English |
Paper | Pre-training Transformers on Indian Legal Text |
What is InLegalBERT?
InLegalBERT is a specialized language model developed by researchers at IIT Kharagpur, specifically designed for Indian legal text processing. Built upon the LEGAL-BERT-SC foundation, this model has been trained on an extensive corpus of 5.4 million Indian legal documents spanning from 1950 to 2019, encompassing various legal domains including Civil, Criminal, and Constitutional law.
Implementation Details
The model follows the BERT-base architecture with 12 hidden layers, 768 hidden dimensions, and 12 attention heads. It was further trained for 300,000 steps using Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks on a 27GB corpus of legal text.
- Pre-trained on Indian Supreme Court and High Court documents
- Compatible with the LegalBERT tokenizer
- Optimized for legal domain-specific tasks
Core Capabilities
- Legal Statute Identification from case facts
- Semantic Segmentation of legal documents into functional parts
- Court Judgment Prediction
- Generation of legal document embeddings
Frequently Asked Questions
Q: What makes this model unique?
InLegalBERT specializes in Indian legal text processing, demonstrating superior performance across various legal tasks compared to baseline models. Its training on a vast corpus of Indian legal documents makes it particularly effective for Indian legal applications.
Q: What are the recommended use cases?
The model is ideal for tasks such as legal document analysis, statute identification, court judgment prediction, and semantic segmentation of legal documents. It's particularly well-suited for applications involving Indian legal texts and documentation.