Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study
Paper β’ 2411.02462 β’ Published β’ 10
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Fine-tuned Qwen2.5-Coder-7B-Instruct with LoRA for comprehensive unit test generation.
This model generates ALL test cases including edge cases for any source code input. Based on the paper "Parameter-Efficient Fine-Tuning of LLMs for Unit Test Generation" (arxiv:2411.02462).
| Component | Details |
|---|---|
| Base Model | Qwen/Qwen2.5-Coder-7B-Instruct |
| Method | LoRA (rank=16, alpha=32) |
| Dataset | andstor/methods2test fm+fc+t+tc (46K+ samples) |
| Training | 3 epochs, lr=1e-4, cosine schedule, effective batch=32 |
| Hardware | A10G-large (24GB VRAM) |
| Framework | TRL SFTTrainer + PEFT LoRA |
q_proj, k_proj, v_proj, o_projgate_proj, up_proj, down_proj# 1. Build the training dataset
pip install datasets huggingface_hub
python scripts/data_pipeline.py --repo YOUR_ORG/testgen-data --max-samples 50000
# 2. (Optional) Add your company's code+test files
python scripts/data_pipeline.py --repo YOUR_ORG/testgen-data --custom-dirs /path/to/your/code
# 3. Run training
pip install transformers trl torch datasets trackio accelerate peft bitsandbytes
python scripts/train.py
# Or via HF Jobs (recommended):
# Hardware: a10g-large, Timeout: 5h
Your raw training data should be organized as code files paired with test files:
your-project/
βββ src/
β βββ calculator.py
β βββ utils.py
β βββ auth.py
βββ tests/
βββ test_calculator.py
βββ test_utils.py
βββ test_auth.py
The data pipeline auto-discovers pairs using naming conventions:
calculator.py β test_calculator.pyCalculator.java β CalculatorTest.java calculator.js β calculator.test.jsTry it: π§ͺ AI Test Case Generator Space