ELECTRA Large Discriminator SQuAD2.0

Property	Value
Author	ahotrod
Framework	PyTorch, TensorFlow
Task	Question Answering
Downloads	39,990

What is electra_large_discriminator_squad2_512?

This is a fine-tuned version of the ELECTRA large discriminator model specifically optimized for question answering tasks using the SQuAD2.0 dataset. It achieves impressive performance metrics with 87.1% exact match accuracy and 90% F1 score, making it particularly effective for both answerable and unanswerable questions.

Implementation Details

The model was trained using PyTorch and TensorFlow frameworks with specific hyperparameters including a learning rate of 3e-5, weight decay of 0.01, and maximum sequence length of 512 tokens. Training was conducted over 3 epochs with mixed precision training (FP16) for optimal performance.

Trained on SQuAD2.0 dataset with both answerable and unanswerable questions
Uses 512 token maximum sequence length with 128 token document stride
Implements gradient accumulation with 16 steps
Utilizes FP16 optimization for improved training efficiency

Core Capabilities

Excellent performance on answerable questions (84.7% exact match)
Strong handling of unanswerable questions (89.5% accuracy)
Robust F1 score of 89.98% across all question types
Efficient processing with support for batch inference

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its balanced performance on both answerable and unanswerable questions in SQuAD2.0, making it particularly reliable for real-world applications where not all questions have answers in the given context.

Q: What are the recommended use cases?

The model is ideal for question answering systems, document analysis, and information extraction tasks where high accuracy and the ability to determine answer presence are crucial. It's particularly well-suited for applications requiring long context processing up to 512 tokens.

electra_large_discriminator_squad2_512