keyphrase-extraction-distilbert-inspec

Property	Value
License	MIT
Paper	Research Paper
F1 Score (Seqeval)	0.509
Base Architecture	DistilBERT

What is keyphrase-extraction-distilbert-inspec?

This is a specialized transformer-based model designed for automatically extracting key phrases from scientific texts. Built on DistilBERT architecture and fine-tuned on the Inspec dataset, it treats keyphrase extraction as a token classification problem, labeling words as either part of a keyphrase (B-KEY, I-KEY) or not (O).

Implementation Details

The model employs a sophisticated token classification pipeline, utilizing DistilBERT's contextual understanding capabilities. It was trained with a learning rate of 1e-4 over 50 epochs, with early stopping patience of 3 epochs. The implementation handles documents up to 512 tokens in length and processes text through a specialized tokenization pipeline.

Advanced token classification architecture
Automatic keyphrase boundary detection
Specialized scientific text processing
Efficient preprocessing and postprocessing pipeline

Core Capabilities

Extracts meaningful keyphrases from scientific papers
Achieves 0.49 F1@M score on test data
Handles complex scientific terminology
Real-time keyphrase extraction

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for scientific paper analysis, utilizing a fine-tuned DistilBERT architecture that understands academic context and terminology. It's particularly effective at identifying keyphrases in research abstracts and technical documents.

Q: What are the recommended use cases?

The model is best suited for processing scientific papers, particularly in Computer Science and Information Technology domains. It's ideal for automatic indexing, content summarization, and research paper analysis, though it's specifically optimized for English-language academic content.