bl-books-genre

bl-books-genre

TheBritishLibrary

A specialized text classification model for detecting fiction/non-fiction in historical book titles, supporting 30 languages with 65.8M parameters.

PropertyValue
Parameter Count65.8M
Model TypeDistilBERT Text Classification
LicenseMIT
Languages Supported30 languages
F1 Score0.94 (weighted avg)

What is bl-books-genre?

The bl-books-genre is a specialized text classification model fine-tuned on DistilBERT-base-cased architecture to classify historical book titles as either fiction or non-fiction. Developed by The British Library, this model was specifically trained on their Digitised printed books collection from the 18th-19th century, comprising 49,455 digitized books across multiple languages.

Implementation Details

The model leverages a fine-tuned DistilBERT architecture trained using the blurr library. The training data was curated through a combination of expert cataloguer annotations via the Zooniverse platform and programmatic labeling using Snorkel, ensuring broad coverage while maintaining quality.

  • Architecture: Fine-tuned DistilBERT-base-cased
  • Training Process: Hybrid annotation approach combining expert knowledge and programmatic labeling
  • Performance: Achieves 0.94 weighted average F1-score
  • Multilingual Support: Primary focus on English with support for 29 additional languages

Core Capabilities

  • Binary classification of book titles into fiction/non-fiction categories
  • Multilingual processing with support for 30 languages
  • Specialized handling of historical book titles (18th-19th century)
  • High accuracy with 0.88 precision for fiction and 0.98 for non-fiction

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for historical book classification, trained on authentic 18th-19th century titles from the British Library's collection. Its ability to handle historical cataloguing practices and multilingual support makes it particularly valuable for digital humanities and library science applications.

Q: What are the recommended use cases?

The model is ideal for large-scale digitization projects, library collection analysis, and digital humanities research focusing on historical texts. It's particularly suited for institutions working with historical book collections needing automated genre classification.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026