LLM4Decompile 6.7B v1.5

Property	Value
Parameter Count	6.74B
License	MIT
Tensor Type	BF16
Training Data	15B tokens
Context Length	4,096 tokens

What is llm4decompile-6.7b-v1.5?

LLM4Decompile is a specialized language model designed to convert x86 assembly instructions into C code. Version 1.5 represents a significant improvement over previous iterations, demonstrating up to 100% performance enhancement in decompilation tasks.

Implementation Details

The model is built on the LLaMA architecture and trained specifically for binary decompilation tasks. It processes assembly code through a sophisticated pipeline that handles various optimization levels (O0-O3) and can accurately reconstruct C source code from compiled binaries.

Supports multiple GCC optimization levels (O0-O3)
Processes complete assembly functions with context
Handles complex binary transformations
Implements efficient tokenization for assembly code

Core Capabilities

Assembly to C code conversion with high accuracy
Support for different compiler optimization levels
Superior performance on HumanEval-Decompile benchmark (68.05% at O0)
Effective handling of ExeBench test cases
Outperforms GPT-4 and DeepSeek-Coder in decompilation tasks

Frequently Asked Questions

Q: What makes this model unique?

This model specifically excels at binary decompilation, achieving state-of-the-art results that surpass both smaller and larger models, including GPT-4. Its specialized training on assembly-to-C conversion makes it particularly effective for reverse engineering tasks.

Q: What are the recommended use cases?

The model is ideal for reverse engineering compiled binaries, malware analysis, legacy code recovery, and software security research. It's particularly effective when working with x86 assembly code compiled with different optimization levels.