# TestGen: AI Test Case Generator (Qwen2.5-Coder-7B + LoRA) Fine-tuned Qwen2.5-Coder-7B-Instruct with LoRA for comprehensive unit test generation. ## Overview This model generates **ALL test cases** including edge cases for any source code input. Based on the paper *"Parameter-Efficient Fine-Tuning of LLMs for Unit Test Generation"* (arxiv:2411.02462). ## Training Recipe | Component | Details | |-----------|---------| | **Base Model** | [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) | | **Method** | LoRA (rank=16, alpha=32) | | **Dataset** | [andstor/methods2test](https://huggingface.co/datasets/andstor/methods2test) fm+fc+t+tc (46K+ samples) | | **Training** | 3 epochs, lr=1e-4, cosine schedule, effective batch=32 | | **Hardware** | A10G-large (24GB VRAM) | | **Framework** | TRL SFTTrainer + PEFT LoRA | ### LoRA Target Modules - Attention: `q_proj`, `k_proj`, `v_proj`, `o_proj` - MLP: `gate_proj`, `up_proj`, `down_proj` ## How to Run Training ```bash # 1. Build the training dataset pip install datasets huggingface_hub python scripts/data_pipeline.py --repo YOUR_ORG/testgen-data --max-samples 50000 # 2. (Optional) Add your company's code+test files python scripts/data_pipeline.py --repo YOUR_ORG/testgen-data --custom-dirs /path/to/your/code # 3. Run training pip install transformers trl torch datasets trackio accelerate peft bitsandbytes python scripts/train.py # Or via HF Jobs (recommended): # Hardware: a10g-large, Timeout: 5h ``` ## How to Add Your Company's Data Your raw training data should be organized as code files paired with test files: ``` your-project/ ├── src/ │ ├── calculator.py │ ├── utils.py │ └── auth.py └── tests/ ├── test_calculator.py ├── test_utils.py └── test_auth.py ``` The data pipeline auto-discovers pairs using naming conventions: - Python: `calculator.py` ↔ `test_calculator.py` - Java: `Calculator.java` ↔ `CalculatorTest.java` - JS/TS: `calculator.js` ↔ `calculator.test.js` ## Live Demo Try it: [🧪 AI Test Case Generator Space](https://huggingface.co/spaces/Navyatha2006/ai-test-case-generator) ## Resources - **Training Dataset:** [Navyatha2006/testgen-sft-data](https://huggingface.co/datasets/Navyatha2006/testgen-sft-data) - **Live App:** [Navyatha2006/ai-test-case-generator](https://huggingface.co/spaces/Navyatha2006/ai-test-case-generator) - **Paper:** [Parameter-Efficient Fine-Tuning of LLMs for Unit Test Generation](https://arxiv.org/abs/2411.02462) - **Base Dataset:** [andstor/methods2test](https://huggingface.co/datasets/andstor/methods2test)