TestGen: AI Test Case Generator (Qwen2.5-Coder-7B + LoRA)
Fine-tuned Qwen2.5-Coder-7B-Instruct with LoRA for comprehensive unit test generation.
Overview
This model generates ALL test cases including edge cases for any source code input. Based on the paper "Parameter-Efficient Fine-Tuning of LLMs for Unit Test Generation" (arxiv:2411.02462).
Training Recipe
| Component | Details |
|---|---|
| Base Model | Qwen/Qwen2.5-Coder-7B-Instruct |
| Method | LoRA (rank=16, alpha=32) |
| Dataset | andstor/methods2test fm+fc+t+tc (46K+ samples) |
| Training | 3 epochs, lr=1e-4, cosine schedule, effective batch=32 |
| Hardware | A10G-large (24GB VRAM) |
| Framework | TRL SFTTrainer + PEFT LoRA |
LoRA Target Modules
- Attention:
q_proj,k_proj,v_proj,o_proj - MLP:
gate_proj,up_proj,down_proj
How to Run Training
# 1. Build the training dataset
pip install datasets huggingface_hub
python scripts/data_pipeline.py --repo YOUR_ORG/testgen-data --max-samples 50000
# 2. (Optional) Add your company's code+test files
python scripts/data_pipeline.py --repo YOUR_ORG/testgen-data --custom-dirs /path/to/your/code
# 3. Run training
pip install transformers trl torch datasets trackio accelerate peft bitsandbytes
python scripts/train.py
# Or via HF Jobs (recommended):
# Hardware: a10g-large, Timeout: 5h
How to Add Your Company's Data
Your raw training data should be organized as code files paired with test files:
your-project/
βββ src/
β βββ calculator.py
β βββ utils.py
β βββ auth.py
βββ tests/
βββ test_calculator.py
βββ test_utils.py
βββ test_auth.py
The data pipeline auto-discovers pairs using naming conventions:
- Python:
calculator.pyβtest_calculator.py - Java:
Calculator.javaβCalculatorTest.java - JS/TS:
calculator.jsβcalculator.test.js
Live Demo
Try it: π§ͺ AI Test Case Generator Space
Resources
- Training Dataset: Navyatha2006/testgen-sft-data
- Live App: Navyatha2006/ai-test-case-generator
- Paper: Parameter-Efficient Fine-Tuning of LLMs for Unit Test Generation
- Base Dataset: andstor/methods2test