bl-books-genre

Maintained By
TheBritishLibrary

British Library Books Genre Detector

PropertyValue
Parameter Count65.8M
Model TypeDistilBERT Text Classification
LicenseMIT
Languages Supported30 languages
F1 Score0.94 (weighted avg)

What is bl-books-genre?

The bl-books-genre is a specialized text classification model fine-tuned on DistilBERT-base-cased architecture to classify historical book titles as either fiction or non-fiction. Developed by The British Library, this model was specifically trained on their Digitised printed books collection from the 18th-19th century, comprising 49,455 digitized books across multiple languages.

Implementation Details

The model leverages a fine-tuned DistilBERT architecture trained using the blurr library. The training data was curated through a combination of expert cataloguer annotations via the Zooniverse platform and programmatic labeling using Snorkel, ensuring broad coverage while maintaining quality.

  • Architecture: Fine-tuned DistilBERT-base-cased
  • Training Process: Hybrid annotation approach combining expert knowledge and programmatic labeling
  • Performance: Achieves 0.94 weighted average F1-score
  • Multilingual Support: Primary focus on English with support for 29 additional languages

Core Capabilities

  • Binary classification of book titles into fiction/non-fiction categories
  • Multilingual processing with support for 30 languages
  • Specialized handling of historical book titles (18th-19th century)
  • High accuracy with 0.88 precision for fiction and 0.98 for non-fiction

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for historical book classification, trained on authentic 18th-19th century titles from the British Library's collection. Its ability to handle historical cataloguing practices and multilingual support makes it particularly valuable for digital humanities and library science applications.

Q: What are the recommended use cases?

The model is ideal for large-scale digitization projects, library collection analysis, and digital humanities research focusing on historical texts. It's particularly suited for institutions working with historical book collections needing automated genre classification.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.