# TestGen: AI Test Case Generator (Qwen2.5-Coder-7B + LoRA)

Fine-tuned Qwen2.5-Coder-7B-Instruct with LoRA for comprehensive unit test generation.

## Overview

This model generates **ALL test cases** including edge cases for any source code input. Based on the paper *"Parameter-Efficient Fine-Tuning of LLMs for Unit Test Generation"* (arxiv:2411.02462).

## Training Recipe

| Component | Details |
|-----------|---------|
| **Base Model** | [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) |
| **Method** | LoRA (rank=16, alpha=32) |
| **Dataset** | [andstor/methods2test](https://huggingface.co/datasets/andstor/methods2test) fm+fc+t+tc (46K+ samples) |
| **Training** | 3 epochs, lr=1e-4, cosine schedule, effective batch=32 |
| **Hardware** | A10G-large (24GB VRAM) |
| **Framework** | TRL SFTTrainer + PEFT LoRA |

### LoRA Target Modules
- Attention: `q_proj`, `k_proj`, `v_proj`, `o_proj`
- MLP: `gate_proj`, `up_proj`, `down_proj`

## How to Run Training

```bash
# 1. Build the training dataset
pip install datasets huggingface_hub
python scripts/data_pipeline.py --repo YOUR_ORG/testgen-data --max-samples 50000

# 2. (Optional) Add your company's code+test files
python scripts/data_pipeline.py --repo YOUR_ORG/testgen-data --custom-dirs /path/to/your/code

# 3. Run training
pip install transformers trl torch datasets trackio accelerate peft bitsandbytes
python scripts/train.py

# Or via HF Jobs (recommended):
# Hardware: a10g-large, Timeout: 5h
```

## How to Add Your Company's Data

Your raw training data should be organized as code files paired with test files:

```
your-project/
├── src/
│   ├── calculator.py
│   ├── utils.py
│   └── auth.py
└── tests/
    ├── test_calculator.py
    ├── test_utils.py
    └── test_auth.py
```

The data pipeline auto-discovers pairs using naming conventions:
- Python: `calculator.py` ↔ `test_calculator.py`
- Java: `Calculator.java` ↔ `CalculatorTest.java`  
- JS/TS: `calculator.js` ↔ `calculator.test.js`

## Live Demo

Try it: [🧪 AI Test Case Generator Space](https://huggingface.co/spaces/Navyatha2006/ai-test-case-generator)

## Resources

- **Training Dataset:** [Navyatha2006/testgen-sft-data](https://huggingface.co/datasets/Navyatha2006/testgen-sft-data)
- **Live App:** [Navyatha2006/ai-test-case-generator](https://huggingface.co/spaces/Navyatha2006/ai-test-case-generator)
- **Paper:** [Parameter-Efficient Fine-Tuning of LLMs for Unit Test Generation](https://arxiv.org/abs/2411.02462)
- **Base Dataset:** [andstor/methods2test](https://huggingface.co/datasets/andstor/methods2test)