aspire-contextualsentence-singlem-compsci

Maintained By
allenai

aspire-contextualsentence-singlem-compsci

PropertyValue
AuthorAllen AI
PaperMulti-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity
GitHuballenai/aspire
Performance (MAP)41.33 on CSFCube

What is aspire-contextualsentence-singlem-compsci?

This is a specialized BERT-based model designed for fine-grained similarity matching between computer science papers. It represents documents using contextual sentence vectors, created by averaging token representations of individual sentences while maintaining cross-attention between the title and abstract. The model was trained on 1.2 million computer science paper pairs using co-citation contexts for alignment.

Implementation Details

The model uses the Adam Optimizer with a 2e-5 learning rate and 1000 warm-up steps, followed by linear decay. It processes paper titles and abstracts to generate sentence-level embeddings, enabling fine-grained document similarity comparisons through L2 distance calculations between sentence vectors.

  • Trained on co-cited paper pairs with sentence alignment
  • Uses contrastive learning with in-batch negatives
  • Implements cross-attention in the encoder block
  • Evaluates using minimal L2 distance between sentences

Core Capabilities

  • Fine-grained document similarity analysis
  • Aspect-conditional document retrieval
  • Sentence-to-sentence similarity matching
  • Computer science domain expertise
  • Document classification (with fine-tuning)

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to perform fine-grained similarity matching using multiple vectors per document, leveraging co-citation contexts for training. This allows for more precise document comparison at the sentence level, rather than just document-level matching.

Q: What are the recommended use cases?

The model is best suited for tasks involving computer science paper similarity, particularly when specific aspects or sentences need to be matched. It excels in scenarios where users need to find papers based on specific sentences or concepts rather than entire document similarity.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.