keyphrase-extraction-kbir-openkp

keyphrase-extraction-kbir-openkp

ml6team

Transformer-based keyphrase extraction model using KBIR architecture, fine-tuned on OpenKP dataset. Specializes in identifying key phrases from English text documents with high accuracy.

PropertyValue
Model TypeToken Classification
Training DatasetOpenKP (148,124 web documents)
Learning Rate1e-4
Training Epochs50
Early Stopping3 epochs patience

What is keyphrase-extraction-kbir-openkp?

This is a specialized transformer model designed for automatic keyphrase extraction from text documents. Built on the KBIR (Keyphrase Boundary Infilling with Replacement) architecture and fine-tuned on the OpenKP dataset, it employs a multi-task learning approach combining Masked Language Modeling, Keyphrase Boundary Infilling, and Keyphrase Replacement Classification to identify the most relevant keyphrases in a text.

Implementation Details

The model operates as a token classification system, categorizing each word as either the beginning of a keyphrase (B-KEY), inside a keyphrase (I-KEY), or outside a keyphrase (O). It achieves impressive performance metrics, including P@5: 0.13, R@5: 0.38, and F1@M: 0.39 on the OpenKP test set.

  • Multi-task learning architecture combining MLM, KBI, and KRC
  • Trained on 148,124 real-world web documents
  • Implements sophisticated token classification pipeline
  • Supports batch processing and efficient keyphrase extraction

Core Capabilities

  • Automatic extraction of relevant keyphrases from English text
  • Semantic understanding of document context
  • High-precision keyphrase boundary detection
  • Efficient processing of long documents
  • Support for both single and multi-word keyphrases

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its KBIR architecture and multi-task learning approach, which allows it to capture semantic dependencies and context better than traditional statistical methods. It's particularly effective at understanding the contextual relationships between words and phrases in a document.

Q: What are the recommended use cases?

The model is ideal for automatic document indexing, content summarization, and automated metadata generation. It's particularly useful for processing large volumes of text documents where manual keyphrase extraction would be time-consuming. However, it's limited to English language documents and has a cap on the number of predicted keyphrases.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026