Qwen3-VL-4B SpreadsheetBench QLoRA

This repository contains a PEFT/QLoRA adapter trained for spreadsheet manipulation code generation on top of:

unsloth/Qwen3-VL-4B-Thinking-unsloth-bnb-4bit

The adapter was trained to generate executable Python, primarily openpyxl, for SpreadsheetBench-style workbook manipulation tasks.

Runtime helpers and vLLM examples live at:

https://github.com/DrStrangel0ve/spreadsheetbench-qwen3vl-qlora

Important note

The best benchmark result reported below uses this adapter together with a tightened SpreadsheetBench inference/runtime layer that enforces code-only output, output-path correctness, workbook saves, target-change checks, and deterministic recovery templates for common spreadsheet failure patterns.

Adapter-only performance improved modestly. Adapter plus the tightened runtime produced the largest practical gain.

Results

On the 200-case SpreadsheetBench slice used during development:

System Tests passed Soft avg Hard avg Full-pass cases Output workbooks
Original base GGUF 126/600 0.2100 0.1800 36/200 600
Base GGUF + tightened templates 143/600 0.2383 0.2200 44/200 600
Initial Kaggle/template QLoRA 122/600 0.2033 0.1750 35/200 583
Failure-algorithmic QLoRA v2 135/600 0.2250 0.1950 39/200 593
Failure-algorithmic QLoRA v2 + tightened templates 157/600 0.2617 0.2300 46/200 593

Training

The selected adapter is outputs/qlora_failure_algorithmic_v2.

Training configuration:

  • Base model: unsloth/Qwen3-VL-4B-Thinking-unsloth-bnb-4bit
  • Method: QLoRA / PEFT LoRA
  • Target modules: q_proj, k_proj, v_proj, o_proj
  • LoRA rank: 8
  • LoRA alpha: 16
  • LoRA dropout: 0.05
  • Max examples: 1800
  • Epochs: 1
  • Max sequence length: 896
  • Learning rate: 7e-5
  • Warmup ratio: 0.04
  • Gradient accumulation: 4
  • Weight decay: 0.0
  • Max grad norm: 0.25

The adapter was initialized from an earlier Kaggle/template adapter trained with learning rate 3e-4. A failure-focused adapter at 1.5e-4 and a later v3 continuation at 5e-5 were tested but not promoted.

Data

The training mix included:

  • Kaggle-derived synthetic spreadsheet tasks.
  • Spreadsheet template tasks.
  • Failure-archetype tasks derived from benchmark failure analysis.

The Kaggle synthetic set was built locally from downloaded CSV/XLSX files. Candidate solvers were executed to create gold output workbooks before examples were accepted. The Kaggle generation accepted 278 examples and rejected 18.

Loading with Transformers and PEFT

from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoTokenizer, BitsAndBytesConfig
import torch

base_model = "unsloth/Qwen3-VL-4B-Thinking-unsloth-bnb-4bit"
adapter = "DrStrangel0ve/Qwen3-VL-4B-SpreadsheetBench-QLoRA"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(adapter, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    base_model,
    trust_remote_code=True,
    device_map="auto",
    quantization_config=bnb_config,
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()

Serving with vLLM

vLLM supports serving LoRA adapters with --enable-lora and --lora-modules name=path_or_repo. See the official vLLM LoRA documentation: https://docs.vllm.ai/en/stable/features/lora.html

Example:

vllm serve unsloth/Qwen3-VL-4B-Thinking-unsloth-bnb-4bit \
  --enable-lora \
  --lora-modules spreadsheet=DrStrangel0ve/Qwen3-VL-4B-SpreadsheetBench-QLoRA \
  --max-model-len 4096

Depending on your vLLM version and GPU, you may need additional quantization flags for the Unsloth 4-bit base model.

Limitations

  • This is a QLoRA adapter, not a fully merged standalone model.
  • SpreadsheetBench scores depend strongly on the execution harness and postprocessing.
  • The best reported score includes deterministic runtime templates, not just raw model generation.
  • The adapter is specialized for code-generation style spreadsheet tasks and should not be treated as a general-purpose finance or spreadsheet reasoning model.
Downloads last month
28
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DrStrangel0ve/Qwen3-VL-4B-SpreadsheetBench-QLoRA