fasttext-bg-vectors

Maintained By
facebook

fasttext-bg-vectors

PropertyValue
LicenseCC-BY-SA 3.0
LanguageBulgarian
FrameworkfastText
Training DataCommon Crawl & Wikipedia

What is fasttext-bg-vectors?

fasttext-bg-vectors is a Bulgarian language model developed by Facebook using the fastText library. It provides efficient word representations and text classification capabilities specifically for the Bulgarian language. The model was trained on a massive dataset combining Common Crawl and Wikipedia content, using CBOW with position-weights, 300-dimensional vectors, and character n-grams of length 5.

Implementation Details

The model utilizes fastText's efficient learning architecture to generate word embeddings and supports text classification tasks. It was trained with specific parameters including a window size of 5 and 10 negatives, optimized for both performance and accuracy.

  • 300-dimensional word vectors
  • Character n-grams of length 5
  • Position-weighted CBOW training
  • Supports efficient nearest neighbor queries

Core Capabilities

  • Word vector representation for Bulgarian text
  • Fast and efficient text classification
  • Nearest neighbor word queries
  • Supports subword information
  • Compatible with standard hardware

Frequently Asked Questions

Q: What makes this model unique?

The model combines efficiency with comprehensive Bulgarian language coverage, trained on a large-scale dataset using fastText's advanced architecture. It supports both word embeddings and text classification while maintaining a lightweight footprint.

Q: What are the recommended use cases?

The model is ideal for Bulgarian text classification tasks, semantic analysis, word similarity calculations, and natural language processing applications requiring word embeddings. It's particularly useful for applications needing efficient text processing without specialized hardware.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.