keyphrase-extraction-distilbert-inspec
Property | Value |
---|---|
License | MIT |
Paper | Research Paper |
F1 Score (Seqeval) | 0.509 |
Base Architecture | DistilBERT |
What is keyphrase-extraction-distilbert-inspec?
This is a specialized transformer-based model designed for automatically extracting key phrases from scientific texts. Built on DistilBERT architecture and fine-tuned on the Inspec dataset, it treats keyphrase extraction as a token classification problem, labeling words as either part of a keyphrase (B-KEY, I-KEY) or not (O).
Implementation Details
The model employs a sophisticated token classification pipeline, utilizing DistilBERT's contextual understanding capabilities. It was trained with a learning rate of 1e-4 over 50 epochs, with early stopping patience of 3 epochs. The implementation handles documents up to 512 tokens in length and processes text through a specialized tokenization pipeline.
- Advanced token classification architecture
- Automatic keyphrase boundary detection
- Specialized scientific text processing
- Efficient preprocessing and postprocessing pipeline
Core Capabilities
- Extracts meaningful keyphrases from scientific papers
- Achieves 0.49 F1@M score on test data
- Handles complex scientific terminology
- Real-time keyphrase extraction
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for scientific paper analysis, utilizing a fine-tuned DistilBERT architecture that understands academic context and terminology. It's particularly effective at identifying keyphrases in research abstracts and technical documents.
Q: What are the recommended use cases?
The model is best suited for processing scientific papers, particularly in Computer Science and Information Technology domains. It's ideal for automatic indexing, content summarization, and research paper analysis, though it's specifically optimized for English-language academic content.