LLM4Decompile 6.7B v1.5
Property | Value |
---|---|
Parameter Count | 6.74B |
License | MIT |
Tensor Type | BF16 |
Training Data | 15B tokens |
Context Length | 4,096 tokens |
What is llm4decompile-6.7b-v1.5?
LLM4Decompile is a specialized language model designed to convert x86 assembly instructions into C code. Version 1.5 represents a significant improvement over previous iterations, demonstrating up to 100% performance enhancement in decompilation tasks.
Implementation Details
The model is built on the LLaMA architecture and trained specifically for binary decompilation tasks. It processes assembly code through a sophisticated pipeline that handles various optimization levels (O0-O3) and can accurately reconstruct C source code from compiled binaries.
- Supports multiple GCC optimization levels (O0-O3)
- Processes complete assembly functions with context
- Handles complex binary transformations
- Implements efficient tokenization for assembly code
Core Capabilities
- Assembly to C code conversion with high accuracy
- Support for different compiler optimization levels
- Superior performance on HumanEval-Decompile benchmark (68.05% at O0)
- Effective handling of ExeBench test cases
- Outperforms GPT-4 and DeepSeek-Coder in decompilation tasks
Frequently Asked Questions
Q: What makes this model unique?
This model specifically excels at binary decompilation, achieving state-of-the-art results that surpass both smaller and larger models, including GPT-4. Its specialized training on assembly-to-C conversion makes it particularly effective for reverse engineering tasks.
Q: What are the recommended use cases?
The model is ideal for reverse engineering compiled binaries, malware analysis, legacy code recovery, and software security research. It's particularly effective when working with x86 assembly code compiled with different optimization levels.