ChemGPT-4.7M

Maintained By
ncfrey

ChemGPT-4.7M

PropertyValue
FrameworkPyTorch
Model TypeGPT-Neo based Transformer
Training DataPubChem10M
PaperNeural Scaling of Deep Chemical Models

What is ChemGPT-4.7M?

ChemGPT-4.7M is a specialized transformer model designed for generative molecular modeling. Based on the GPT-Neo architecture, it has been pretrained on the PubChem10M dataset to understand and generate chemical structures represented as SMILES strings. The model represents a significant step forward in applying large language models to chemical synthesis and drug discovery.

Implementation Details

The model leverages the transformers library architecture and implements several key technical innovations. During preprocessing, SMILES strings are converted to SELFIES using version 1.0.4 of the SELFIES library, ensuring robust chemical structure representation.

  • Built on GPT-Neo architecture
  • Trained on PubChem10M dataset
  • Uses SELFIES representation for chemical structures
  • Implemented using PyTorch framework
  • Available through the Hugging Face transformers library

Core Capabilities

  • Generation of novel molecular structures
  • Chemical structure representation learning
  • Molecular property prediction
  • Investigation of pre-training effects on chemical datasets

Frequently Asked Questions

Q: What makes this model unique?

ChemGPT-4.7M stands out for its specific focus on chemical structure generation and its foundation in the proven GPT-Neo architecture. The use of SELFIES representation adds robustness to chemical structure handling, making it particularly suitable for molecular generation tasks.

Q: What are the recommended use cases?

The model is primarily intended for research purposes, particularly for investigating the effects of pre-training and fine-tuning on downstream chemical datasets. While it can generate molecules, its main strength lies in academic research and exploratory studies in chemical modeling.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.