British Library Books Genre Detector

Property	Value
Parameter Count	65.8M
Model Type	DistilBERT Text Classification
License	MIT
Languages Supported	30 languages
F1 Score	0.94 (weighted avg)

What is bl-books-genre?

The bl-books-genre is a specialized text classification model fine-tuned on DistilBERT-base-cased architecture to classify historical book titles as either fiction or non-fiction. Developed by The British Library, this model was specifically trained on their Digitised printed books collection from the 18th-19th century, comprising 49,455 digitized books across multiple languages.

Implementation Details

The model leverages a fine-tuned DistilBERT architecture trained using the blurr library. The training data was curated through a combination of expert cataloguer annotations via the Zooniverse platform and programmatic labeling using Snorkel, ensuring broad coverage while maintaining quality.

Architecture: Fine-tuned DistilBERT-base-cased
Training Process: Hybrid annotation approach combining expert knowledge and programmatic labeling
Performance: Achieves 0.94 weighted average F1-score
Multilingual Support: Primary focus on English with support for 29 additional languages

Core Capabilities

Binary classification of book titles into fiction/non-fiction categories
Multilingual processing with support for 30 languages
Specialized handling of historical book titles (18th-19th century)
High accuracy with 0.88 precision for fiction and 0.98 for non-fiction

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for historical book classification, trained on authentic 18th-19th century titles from the British Library's collection. Its ability to handle historical cataloguing practices and multilingual support makes it particularly valuable for digital humanities and library science applications.

Q: What are the recommended use cases?

The model is ideal for large-scale digitization projects, library collection analysis, and digital humanities research focusing on historical texts. It's particularly suited for institutions working with historical book collections needing automated genre classification.

bl-books-genre