keyphrase-extraction-distilbert-inspec

keyphrase-extraction-distilbert-inspec

ml6team

DistilBERT-based keyphrase extraction model fine-tuned on Inspec dataset, achieving 0.509 F1 score. Specialized for scientific papers & abstracts.

PropertyValue
LicenseMIT
PaperResearch Paper
F1 Score (Seqeval)0.509
Base ArchitectureDistilBERT

What is keyphrase-extraction-distilbert-inspec?

This is a specialized transformer-based model designed for automatically extracting key phrases from scientific texts. Built on DistilBERT architecture and fine-tuned on the Inspec dataset, it treats keyphrase extraction as a token classification problem, labeling words as either part of a keyphrase (B-KEY, I-KEY) or not (O).

Implementation Details

The model employs a sophisticated token classification pipeline, utilizing DistilBERT's contextual understanding capabilities. It was trained with a learning rate of 1e-4 over 50 epochs, with early stopping patience of 3 epochs. The implementation handles documents up to 512 tokens in length and processes text through a specialized tokenization pipeline.

  • Advanced token classification architecture
  • Automatic keyphrase boundary detection
  • Specialized scientific text processing
  • Efficient preprocessing and postprocessing pipeline

Core Capabilities

  • Extracts meaningful keyphrases from scientific papers
  • Achieves 0.49 F1@M score on test data
  • Handles complex scientific terminology
  • Real-time keyphrase extraction

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for scientific paper analysis, utilizing a fine-tuned DistilBERT architecture that understands academic context and terminology. It's particularly effective at identifying keyphrases in research abstracts and technical documents.

Q: What are the recommended use cases?

The model is best suited for processing scientific papers, particularly in Computer Science and Information Technology domains. It's ideal for automatic indexing, content summarization, and research paper analysis, though it's specifically optimized for English-language academic content.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026