British Library Books Genre Detector
Property | Value |
---|---|
Parameter Count | 65.8M |
Model Type | DistilBERT Text Classification |
License | MIT |
Languages Supported | 30 languages |
F1 Score | 0.94 (weighted avg) |
What is bl-books-genre?
The bl-books-genre is a specialized text classification model fine-tuned on DistilBERT-base-cased architecture to classify historical book titles as either fiction or non-fiction. Developed by The British Library, this model was specifically trained on their Digitised printed books collection from the 18th-19th century, comprising 49,455 digitized books across multiple languages.
Implementation Details
The model leverages a fine-tuned DistilBERT architecture trained using the blurr library. The training data was curated through a combination of expert cataloguer annotations via the Zooniverse platform and programmatic labeling using Snorkel, ensuring broad coverage while maintaining quality.
- Architecture: Fine-tuned DistilBERT-base-cased
- Training Process: Hybrid annotation approach combining expert knowledge and programmatic labeling
- Performance: Achieves 0.94 weighted average F1-score
- Multilingual Support: Primary focus on English with support for 29 additional languages
Core Capabilities
- Binary classification of book titles into fiction/non-fiction categories
- Multilingual processing with support for 30 languages
- Specialized handling of historical book titles (18th-19th century)
- High accuracy with 0.88 precision for fiction and 0.98 for non-fiction
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically designed for historical book classification, trained on authentic 18th-19th century titles from the British Library's collection. Its ability to handle historical cataloguing practices and multilingual support makes it particularly valuable for digital humanities and library science applications.
Q: What are the recommended use cases?
The model is ideal for large-scale digitization projects, library collection analysis, and digital humanities research focusing on historical texts. It's particularly suited for institutions working with historical book collections needing automated genre classification.