papuGaPT2

Maintained By
flax-community

papuGaPT2

PropertyValue
LanguagePolish
Training DataOscar Corpus (Polish subset)
Evaluation Perplexity21.79
Authorflax-community

What is papuGaPT2?

papuGaPT2 is a Polish language GPT2 model designed to bring advanced text generation capabilities to the Polish NLP community. Built on the standard GPT2 architecture, it was trained using a causal language modeling approach on the Polish subset of the multilingual Oscar corpus. The model achieved an impressive evaluation perplexity of 21.79, making it a powerful tool for Polish text generation tasks.

Implementation Details

The model uses a byte-level version of Byte Pair Encoding for tokenization with a vocabulary size of 50,257. Training was conducted on a TPUv3 VM in three phases, with varying learning rates and batch sizes. The final training resulted in an evaluation loss of 3.082.

  • Tokenization: Byte-level BPE with 50,257 vocab size
  • Input sequences: 512 consecutive tokens
  • Training infrastructure: TPUv3 VM
  • Training phases: 3 distinct phases with different learning rates

Core Capabilities

  • Text generation with multiple decoding methods (greedy, beam search, sampling)
  • Support for top-k and top-p sampling
  • Zero-shot and few-shot learning capabilities
  • Bad words filtering functionality
  • Context-aware text completion

Frequently Asked Questions

Q: What makes this model unique?

This is one of the first strong text generation models specifically trained for the Polish language, filling a crucial gap in Polish NLP research. Its performance and versatility make it particularly valuable for Polish language processing tasks.

Q: What are the recommended use cases?

The model is primarily recommended for research purposes due to potential biases in the training data. It can be used for text generation, feature extraction, or fine-tuning for downstream tasks. However, users should be aware of and account for potential biases, particularly regarding gender and ethnicity.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.