How to convert a single safetensors file to PEFT format

I’m struggle in getting adapter_model.safetensors and adapter_config.json from this weighted LoRA file

No conversion needed. Use it directly with Diffusers like this:

https://huggingface.co/lightx2v/Qwen-Image-Lightning :

pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image", scheduler=scheduler, torch_dtype=torch.bfloat16
).to("cuda")
pipe.load_lora_weights(
    "lightx2v/Qwen-Image-Lightning", weight_name="Qwen-Image-Lightning-8steps-V1.0.safetensors"
)

thanks reply, but vLLM-Omni request use PEFT format LoRA:

Oh…


Key point: that .safetensors is a Diffusers/ComfyUI LoRA, not a Transformers “PEFT adapter folder”

The file you linked (Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors) is published to be loaded directly via Diffusers (pipe.load_lora_weights(...)) on top of the base model Qwen/Qwen-Image, or used in ComfyUI. The repo’s model card shows exactly that usage pattern. (Hugging Face)

By contrast, a Transformers/PEFT adapter typically lives in a directory containing adapter_config.json + adapter_model.safetensors. (Hugging Face)
Those files are not “extractable” from an arbitrary LoRA .safetensors unless you (re)construct the adapter configuration (target modules, rank, alpha, etc.) in a real model and then re-save it.


What vLLM-Omni expects

vLLM-Omni’s diffusion LoRA endpoint requires a PEFT adapter folder like: lora_adapter/adapter_config.json + lora_adapter/adapter_model.safetensors. (vLLM)

The file you linked (Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors) is a single-file Diffusers LoRA weight (meant to be loaded with pipe.load_lora_weights(...)), not a PEFT adapter folder. (Hugging Face)

So you need to load it into the base model once, then re-save it via Diffusers’ PEFT adapter API (save_lora_adapter), which generates the adapter_config.json and a safetensors weight file. (Hugging Face)


Conversion script (Diffusers → PEFT adapter folder)

Notes:

  • The Qwen-Image-Lightning model card explicitly recommends installing Diffusers from main. (Hugging Face)
  • This produces the exact folder structure vLLM-Omni documents. (vLLM)
import math
import torch
from diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler

# 1) Create the base pipeline (same pattern as the model card)
scheduler_config = {
    "base_image_seq_len": 256,
    "base_shift": math.log(3),
    "invert_sigmas": False,
    "max_image_seq_len": 8192,
    "max_shift": math.log(3),
    "num_train_timesteps": 1000,
    "shift": 1.0,
    "shift_terminal": None,
    "stochastic_sampling": False,
    "time_shift_type": "exponential",
    "use_beta_sigmas": False,
    "use_dynamic_shifting": True,
    "use_exponential_sigmas": False,
    "use_karras_sigmas": False,
}
scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)

pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image",
    scheduler=scheduler,
    torch_dtype=torch.bfloat16,
).to("cuda")

# 2) Load the single safetensors LoRA file into the pipeline
pipe.load_lora_weights(
    "lightx2v/Qwen-Image-Lightning",
    weight_name="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
    adapter_name="lightning_v2",  # give it a name so we can save it explicitly
)

# 3) Re-save as a PEFT adapter folder (adapter_config.json + adapter_model.safetensors)
#    save_lora_adapter() is a PEFT adapter API on the *underlying model component*.
#    For Qwen/Qwen-Image, LoRA is typically on the diffusion "transformer" component.
pipe.transformer.save_lora_adapter(
    "lora_adapter",
    adapter_name="lightning_v2",
    safe_serialization=True,
    weight_name="adapter_model.safetensors",
)

print("Wrote PEFT adapter to ./lora_adapter")

save_lora_adapter(...) is documented to serialize the adapter (and supports weight_name + safetensors). (Hugging Face)


Use the output with vLLM-Omni

Point vLLM-Omni at the created folder:

  • --lora-path /path/to/lora_adapter (must be readable by the server) (vLLM)

  • Folder must contain:

    • adapter_config.json
    • adapter_model.safetensors (vLLM)

Troubleshooting

1) AttributeError: '...Pipeline' object has no attribute 'transformer'

Some pipelines use unet instead of transformer. In that case, save from pipe.unet:

pipe.unet.save_lora_adapter("lora_adapter", adapter_name="lightning_v2",
                           safe_serialization=True, weight_name="adapter_model.safetensors")

2) The LoRA loads in Diffusers but fails in PEFT save

Prefer the PEFT “model-level” path: load the adapter onto the component, then save it. Diffusers documents load_lora_adapter(...) + save_lora_adapter(...) as the direct model-level workflow. (Hugging Face)

3) You’re tempted to hand-write adapter_config.json

Don’t, unless you know the exact target modules / ranks / alphas expected by the model. vLLM-Omni (and Transformers PEFT loaders) assume a valid adapter_config.json alongside the weights. (vLLM)

Edit:
doesn’t work practically…

Hi, I run your script, but only get adapter_model.safetensors, no adapter_config.json, I get it from follow code:

pipe.transformer.peft_config["lightning_v2"].save_pretrained("./lora_adapter")

then pass folder(./lora_adapter) to vLLM-Omni and raise error, it say “state_dict” keys is not match…

Sorry… The implementation of the inference part of the Diffusion model itself seems to differ quite a bit between Diffusers, Comfy UI, and vLLM-Omni.:scream:

In this case, forcing the state_dict key names to match might make it work, but it’s unclear if it would function correctly. (Depends on the code of that version of vLLM-Omni)

Merging it first would definitely work, I think… but it wouldn’t be a conversion.


To use Qwen-Image-Lightning LoRA on vLLM-Omni

Option A (recommended): merge the LoRA into the base model, then serve it as a normal model

This avoids the entire “PEFT adapter keys don’t match” problem.

Why this works: vLLM-Omni’s diffusion LoRA path is strict about module name alignment (see Option B). If you “bake” the LoRA deltas into the base weights, vLLM-Omni just loads a single checkpoint and there is no adapter to validate.

Steps

  1. Load the base Qwen-Image model (same base that the Lightning LoRA was trained for).

  2. Load the Lightning LoRA safetensors into that pipeline (Diffusers or the Qwen-Image reference loader).

  3. Merge/fuse LoRA into the base weights (so the model weights become the adapted weights).

  4. Save the merged model directory.

  5. Serve the merged directory with vLLM-Omni:

    • vLLM-Omni serves a single diffusion model per server instance. (vLLM)
  6. Use 8 inference steps when requesting images (because this LoRA is “8steps”). vLLM-Omni exposes num_inference_steps in the request body. (vLLM)

Why I’d pick this first: vLLM-Omni diffusion LoRA support is PEFT-compatible, but it’s new and keyed to vLLM’s internal module naming/packing behavior. (GitHub)


Why your current “PEFT folder” fails in vLLM-Omni

You already discovered:

  • You can produce adapter_model.safetensors

  • You can produce adapter_config.json via:

    pipe.transformer.peft_config["lightning_v2"].save_pretrained("./lora_adapter")
    

…but vLLM-Omni rejects it with “state_dict keys not match”.

That error is expected if the adapter’s target module names (and therefore the saved weight keys) don’t align with what vLLM-Omni believes are “supported/expected LoRA modules” for that diffusion pipeline.

What vLLM-Omni is doing internally

vLLM-Omni’s DiffusionLoRAManager:

  • Computes supported module suffixes from the pipeline using get_supported_lora_modules()
  • Builds/uses a packed_modules_mapping so it can handle fused projections (e.g., packed QKV) and accept LoRAs trained on logical sub-projections
  • Expands an _expected_lora_modules set
  • Loads the adapter via LoRAModel.from_local_checkpoint(... expected_lora_modules=...)
  • Critically: it passes weights_mapper=None (so there is no automatic renaming of keys) (vLLM)

So if Diffusers/ComfyUI used names like to_q, to_k, to_v, to_out, etc., but vLLM-Omni’s Qwen-Image transformer uses different names (and often packed/fused linears), your adapter keys won’t validate.

This is also why “same repository / same model” can still differ: vLLM-Omni re-implements diffusion transformer components with vLLM-style layers and packed projections for performance/parallelism, so module naming/structure can differ from Diffusers.


Option B: make a real vLLM-Omni-compatible PEFT LoRA (harder, but possible)

vLLM-Omni expects a PEFT folder like: (vLLM)

lora_adapter/
├── adapter_config.json
└── adapter_model.safetensors

But the content must match vLLM-Omni’s expected module names.

B1) First, extract what vLLM-Omni expects (target module suffixes)

Your goal: get the set that DiffusionLoRAManager calls _expected_lora_modules. (vLLM)

Practical ways:

  • Enable debug logging and trigger adapter load; it logs the supported/expected modules. (vLLM)

  • Or write a small script that instantiates the same pipeline/module objects and prints:

    • get_supported_lora_modules(pipeline)
    • any packed_modules_mapping found on modules
    • expanded expected modules (same function the manager uses)

B2) Inspect your Lightning safetensors keys (what you currently have)

Run something like:

from safetensors.torch import load_file

sd = load_file("Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors")
keys = list(sd.keys())
print("num_keys:", len(keys))
print("sample:", keys[:50])

# quick “module suffix” feel
import re
mods = set()
for k in keys:
    # tweak this depending on actual key style you see
    m = re.search(r"\.(to_[qkv]|to_out|q_proj|k_proj|v_proj|proj|fc1|fc2)\b", k)
    if m:
        mods.add(m.group(1))
print("matched module-ish tokens:", sorted(mods))

This tells you whether the file is closer to:

  • Diffusers attention naming (to_q, to_k, to_v, to_out)
  • HF transformer naming (q_proj, k_proj, v_proj, o_proj)
  • Something ComfyUI-specific

B3) Build a mapping: Diffusers/ComfyUI module names → vLLM-Omni module names

Typical mismatch patterns (examples):

  • to_q, to_k, to_v vs packed qkv projections
  • to_out.0 vs proj / o_proj
  • MLP: fc1/fc2 vs gate_up_proj/down_proj-style

vLLM-Omni is explicitly designed to handle packed projections by:

  • discovering packed_modules_mapping on the model
  • treating QKVParallelLinear as 3-slice packed (["q","k","v"]) (vLLM)

So if (and only if) the Qwen-Image vLLM-Omni implementation exposes a compatible mapping, you may be able to rename your adapter keys to match the “slice names” it will accept.

B4) Rewrite the adapter weights and config

You may need to:

  • Rewrite state_dict key paths (the important part)
  • Ensure adapter_config.json includes target_modules that match what vLLM-Omni expects and what your rewritten keys implement (it logs target_modules when loading). (vLLM)

A template for renaming keys:

from safetensors.torch import load_file, save_file

src = load_file("adapter_model.safetensors")

RENAMES = [
    (".to_q.", ".q."),      # example only
    (".to_k.", ".k."),
    (".to_v.", ".v."),
    (".to_out.0.", ".proj."),
]

dst = {}
for k, v in src.items():
    nk = k
    for a, b in RENAMES:
        nk = nk.replace(a, b)
    dst[nk] = v

save_file(dst, "adapter_model_vllm.safetensors")
print("done. keys:", len(dst))

Then point adapter_config.json to target_modules matching the suffixes vLLM-Omni expects.

B5) Reality check: you may need to patch vLLM-Omni

Because diffusion LoRA loading currently uses weights_mapper=None, there is no built-in key translation hook. (vLLM)
If the required mapping is non-trivial (common), the clean solution is:

  • add a weights_mapper for diffusion adapters (or a model-specific mapper for Qwen-Image)
  • or ensure the model exposes packed_modules_mapping that matches popular training tool outputs

How other users effectively use “ComfyUI LoRA” with vLLM-Omni (practically)

Most people who succeed quickly do one of:

  1. Merge LoRA into base weights and serve the merged model (Option A)
  2. Use LoRAs that were trained/exported in PEFT format against a module naming scheme that vLLM/vLLM-Omni accepts (often not ComfyUI-native single-file LoRAs)

Given your current error and vLLM-Omni’s strict loader, Option A is the most reliable path.


Reading list (relevant, practical)

  • vLLM-Omni diffusion LoRA online serving example and required folder format (vLLM)
  • vLLM-Omni DiffusionLoRAManager internals (why key mismatches happen; packed modules mapping; no weights_mapper) (vLLM)
  • vLLM-Omni release notes highlighting “Diffusion LoRA Adapter Support (PEFT-compatible)” (feature maturity context) (GitHub)
  • vLLM LoRA adapters documentation (general vLLM LoRA expectations and serving patterns) (vLLM)

To merge/fuse Lightning into the base model — step-by-step

0) What you will produce

A new local model directory that contains the base Qwen-Image weights with Lightning already applied, so vLLM-Omni loads it as a normal diffusion model (no LoRA at runtime). vLLM-Omni serves diffusion models via /v1/images/generations. (docs.vllm.ai)


1) Prepare environment (Diffusers “main”)

The Lightning model card explicitly says to install Diffusers from main. (Hugging Face)

pip install -U "torch" "transformers" "accelerate" "safetensors"
pip install -U "git+https://github.com/huggingface/diffusers.git"

2) Fuse the V2.0 bf16 LoRA into Qwen/Qwen-Image

Create a script fuse_qwen_image_lightning_v2.py:

import math
import torch
from diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler

# Scheduler config used by Qwen-Image-Lightning authors (shift=3 distillation)
scheduler_config = {
    "base_image_seq_len": 256,
    "base_shift": math.log(3),
    "invert_sigmas": False,
    "max_image_seq_len": 8192,
    "max_shift": math.log(3),
    "num_train_timesteps": 1000,
    "shift": 1.0,
    "shift_terminal": None,
    "stochastic_sampling": False,
    "time_shift_type": "exponential",
    "use_beta_sigmas": False,
    "use_dynamic_shifting": True,
    "use_exponential_sigmas": False,
    "use_karras_sigmas": False,
}

def main():
    device = "cuda"
    dtype = torch.bfloat16

    scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)

    # 1) Load base model
    pipe = DiffusionPipeline.from_pretrained(
        "Qwen/Qwen-Image",
        scheduler=scheduler,
        torch_dtype=dtype,
    ).to(device)

    # 2) Load Lightning LoRA (your file)
    pipe.load_lora_weights(
        "lightx2v/Qwen-Image-Lightning",
        weight_name="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
    )

    # 3) Fuse LoRA into base weights, then unload adapter tensors
    #    (Diffusers recommends unload after fuse, then save_pretrained)
    pipe.fuse_lora(lora_scale=1.0)
    pipe.unload_lora_weights()

    # 4) Save the fused pipeline locally
    out_dir = "./Qwen-Image-Lightning-8steps-V2.0-fused"
    pipe.save_pretrained(out_dir, safe_serialization=True)

    print(f"Saved fused model to: {out_dir}")

if __name__ == "__main__":
    main()

Why these exact pieces:

  • The scheduler config and the “8 steps / true_cfg_scale=1.0” recipe are from the Lightning model card (they use a FlowMatchEulerDiscreteScheduler config with shift=3 via logs, and call the pipeline with 8 steps). (Hugging Face)
  • The fuse workflow is Diffusers’ documented pattern: fuse_lora() → unload_lora_weights() → save_pretrained(). (Hugging Face)

Run it:

python fuse_qwen_image_lightning_v2.py

3) Sanity-check the fused directory (optional but recommended)

After fusion, the model should work without load_lora_weights():

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "./Qwen-Image-Lightning-8steps-V2.0-fused",
    torch_dtype=torch.bfloat16,
).to("cuda")

img = pipe(
    prompt="a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K",
    negative_prompt=" ",
    width=1024,
    height=1024,
    num_inference_steps=8,
    true_cfg_scale=1.0,
    generator=torch.manual_seed(0),
).images[0]

img.save("fused_test.png")

The “8 steps” + true_cfg_scale=1.0 matches the Lightning authors’ recommended inference settings. (Hugging Face)


4) Serve the fused model with vLLM-Omni

vLLM-Omni serves diffusion models with:

vllm serve /ABS/PATH/Qwen-Image-Lightning-8steps-V2.0-fused --omni --port 8000
  • vLLM-Omni uses /v1/images/generations for diffusion models. (docs.vllm.ai)
  • vLLM supports serving a local model path. (vLLM Forums)

If you get OOM during serving, the Qwen text-to-image example notes you can enable VAE slicing/tiling flags to reduce memory. (docs.vllm.ai)


5) Call the API using Lightning-like parameters

vLLM-Omni’s Image Generation API supports num_inference_steps, negative_prompt, and true_cfg_scale. (docs.vllm.ai)

curl -X POST http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K",
    "negative_prompt": " ",
    "size": "1024x1024",
    "num_inference_steps": 8,
    "true_cfg_scale": 1.0,
    "seed": 0
  }' | jq -r ".data[0].b64_json" | base64 -d > out.png

Thank you again for your answer. I tried your method but it still didn’t work. vLLM-Omni raise error “transformer_blocks.0.attn.add_k_proj.alpha is unsupported LoRA weight” , I think we can only hope for support in the new version…:sad_but_relieved_face:

Yeah. Or maybe it’d be faster to save the merged LoRA weights, upload them to Hugging Face, and use those…:thinking:
If we just use the entire model repository instead of LoRA, the differences in LoRA implementation won’t matter.

To make LoRAs for Diffusers/Comfy UI usable with vLLM-Omni, they’d need to make quite a few implementation changes on the vLLM-Omni side… Still, there seems to be demand (since there are many existing LoRAs), so the possibility of implementation might not be zero…

I’ve had luck using this script to convert ComfyUI-formatted plain safetensors LoRAs into a format that’s accepted by vllm-omni: comfyui-to-vllm-omni.py · GitHub

Oh! I’ve rewritten it for Qwen-Image. From what I’ve tested so far, it seems that tensors with keys other than mlp* can be converted. However, it’s unclear whether LoRA will actually work with the converter below…


Qwen-Image support is mostly a naming/prefix problem.

vLLM-Omni diffusion LoRAs must be a PEFT adapter directory (adapter_config.json + adapter_model.safetensors). (vLLM)
vLLM is strict about module-name suffixes and PEFT key naming, and it breaks on *.to_out.0.* unless you normalize it to *.to_out.*. (GitHub)
For Qwen-Image specifically, the pipeline loads transformer weights under a transformer. prefix, and the pipeline has a self.transformer = QwenImageTransformer2DModel(...). (GitHub)
The Qwen-Image transformer also exposes packed projection shard mappings and normalizes .to_out.0. → .to_out. when loading weights. (GitHub)

Below is a rewritten version of the gist that adds a Qwen-Image converter for ComfyUI-style keys like:

transformer_blocks.N.attn.to_q.lora_down.weight

It converts them into PEFT keys like:

base_model.model.transformer.transformer_blocks.N.attn.to_q.lora_A.weight

Rewritten script (drop-in, supports Qwen-Image)

#!/usr/bin/env python3
"""
comfyui-to-vllm-omni-qwenimage.py

Convert ComfyUI-style Qwen-Image LoRA safetensors (lora_down/lora_up) into a PEFT
adapter folder accepted by vLLM-Omni diffusion LoRA loader.

Why this works:
- vLLM-Omni requires PEFT adapter directory format. (adapter_config.json + adapter_model.safetensors)
  https://docs.vllm.ai/projects/vllm-omni/en/latest/user_guide/diffusion/lora/
- vLLM expects lora_A/lora_B naming; ComfyUI uses lora_down/lora_up.
- vLLM has a known failure for ModuleList/Sequential numeric indices like "to_out.0".
  Fix by rewriting to "to_out". https://github.com/vllm-project/vllm/issues/35734
- Qwen-Image pipeline loads transformer weights with prefix "transformer." and defines self.transformer.
  https://raw.githubusercontent.com/vllm-project/vllm-omni/main/vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py
- Qwen-Image transformer exposes packed shard mapping and normalizes ".to_out.0." -> ".to_out." in load_weights.
  https://raw.githubusercontent.com/vllm-project/vllm-omni/main/vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py
"""

import argparse
import json
import re
import sys
from pathlib import Path

import torch
from safetensors.torch import load_file, save_file


# -------------------------
# Qwen-Image settings
# -------------------------

# vLLM strips "base_model.model." internally, and Qwen-Image modules live under "transformer.*"
# (pipeline uses prefix="transformer." and assigns self.transformer=QwenImageTransformer2DModel)
PREFIX_QWEN = "base_model.model.transformer."

# Attention-only by default (recommended). You can optionally include MLP keys with --include-mlp.
ALLOWED_QWEN_PREFIXES_ATTN = (
    "attn.to_q",
    "attn.to_k",
    "attn.to_v",
    "attn.to_out",
    "attn.add_q_proj",
    "attn.add_k_proj",
    "attn.add_v_proj",
    "attn.to_add_out",  # present in Qwen-Image-Lightning
)

# Optional MLP keys observed in Qwen-Image-Lightning (ComfyUI-style)
ALLOWED_QWEN_PREFIXES_MLP = (
    "img_mlp.net.0.proj",
    "img_mlp.net.2",
    "txt_mlp.net.0.proj",
    "txt_mlp.net.2",
)

# PEFT config fields vLLM-Omni documents as important: r, lora_alpha, target_modules, base_model_name_or_path
# https://docs.vllm.ai/projects/vllm-omni/en/latest/user_guide/diffusion/lora/
QWEN_TARGET_MODULES_ATTN = [
    "to_q", "to_k", "to_v", "to_out",
    "add_q_proj", "add_k_proj", "add_v_proj",
    "to_add_out",
    # packed names are fine to include even if unused:
    "to_qkv", "add_kv_proj",
]

# If you include MLP keys, vLLM will validate suffixes against expected modules.
# net.2 can be tricky; keep it optional.
QWEN_TARGET_MODULES_MLP = [
    "proj",
    # caution: module suffix may be "2" for net.2; only enable if your vLLM-Omni build expects it
    "2",
]

ADAPTER_CONFIG_TEMPLATE = {
    "peft_type": "LORA",
    "bias": "none",
    "inference_mode": True,
    "lora_dropout": 0.0,
    "r": None,
    "lora_alpha": None,
    "target_modules": None,
    "base_model_name_or_path": None,
}


# -------------------------
# Helpers
# -------------------------

def _remap_direction(direction: str) -> str:
    """lora_down -> lora_A, lora_up -> lora_B"""
    if direction == "lora_down":
        return "lora_A"
    if direction == "lora_up":
        return "lora_B"
    return direction


def _normalize_modulelist_indices(frag: str) -> str:
    """
    Fix vLLM numeric-index issue:
      attn.to_out.0 -> attn.to_out
    Similar normalization exists in Qwen-Image transformer's load_weights. (see qwen_image_transformer.py)
    """
    frag = frag.replace("attn.to_out.0", "attn.to_out")
    frag = frag.replace("attn.to_add_out.0", "attn.to_add_out")
    return frag


def detect_format(keys: list[str]) -> str:
    sample = [k for k in keys if not k.endswith(".alpha")][:50]
    # Qwen-Image-Lightning (ComfyUI style) looks like:
    # transformer_blocks.N.attn.to_q.lora_down.weight
    if any(re.match(r"^transformer_blocks\.\d+\..+\.(lora_down|lora_up)\.weight$", k) for k in sample):
        return "qwen_transformer_blocks_comfyui"
    return "unknown"


def extract_rank_and_alpha(tensors: dict[str, torch.Tensor]) -> tuple[int, float]:
    alpha = None
    for k, v in tensors.items():
        if k.endswith(".alpha"):
            try:
                alpha = float(v.item())
                break
            except Exception:
                pass

    r = None
    for k, v in tensors.items():
        if k.endswith(".lora_down.weight") and hasattr(v, "shape"):
            r = int(v.shape[0])
            break

    if r is None:
        raise ValueError("Could not infer LoRA rank r. Provide --rank.")
    if alpha is None:
        alpha = float(r)
    return r, alpha


# -------------------------
# Converter: Qwen-Image transformer_blocks.* (ComfyUI lora_down/lora_up)
# -------------------------

def convert_qwen_transformer_blocks_comfyui(
    tensors: dict[str, torch.Tensor],
    include_mlp: bool,
    dtype: torch.dtype,
) -> tuple[dict[str, torch.Tensor], list[str]]:
    out: dict[str, torch.Tensor] = {}
    unmapped: list[str] = []

    allowed_prefixes = ALLOWED_QWEN_PREFIXES_ATTN + (ALLOWED_QWEN_PREFIXES_MLP if include_mlp else ())

    pat = re.compile(r"^transformer_blocks\.(\d+)\.(.+?)\.(lora_down|lora_up)\.weight$")

    for k, v in tensors.items():
        if k.endswith(".alpha"):
            continue

        m = pat.match(k)
        if not m:
            unmapped.append(k)
            continue

        block_idx = int(m.group(1))
        frag = _normalize_modulelist_indices(m.group(2))
        direction = m.group(3)

        if not frag.startswith(allowed_prefixes):
            unmapped.append(k)
            continue

        ab = _remap_direction(direction)
        new_key = f"{PREFIX_QWEN}transformer_blocks.{block_idx}.{frag}.{ab}.weight"

        if v.dtype != dtype:
            v = v.to(dtype)
        out[new_key] = v

    # Final safety: remove any leftover ".to_out.0." in full key
    fixed: dict[str, torch.Tensor] = {}
    for k, v in out.items():
        nk = k.replace(".to_out.0.", ".to_out.").replace(".to_add_out.0.", ".to_add_out.")
        fixed[nk] = v

    return fixed, unmapped


# -------------------------
# Main
# -------------------------

def main():
    ap = argparse.ArgumentParser("Convert ComfyUI Qwen-Image LoRA -> vLLM-Omni PEFT adapter dir")
    ap.add_argument("--input", required=True, help="Input LoRA .safetensors")
    ap.add_argument("--output", required=True, help="Output adapter directory")
    ap.add_argument("--base-model", default="Qwen/Qwen-Image", help="base_model_name_or_path in adapter_config.json")
    ap.add_argument("--dtype", choices=["bf16", "fp16", "fp32"], default="bf16")
    ap.add_argument("--include-mlp", action="store_true", help="Also convert img_mlp/txt_mlp LoRA keys (may fail if vLLM expects different suffixes)")
    args = ap.parse_args()

    dtype_map = {"bf16": torch.bfloat16, "fp16": torch.float16, "fp32": torch.float32}
    out_dtype = dtype_map[args.dtype]

    in_path = Path(args.input)
    if not in_path.exists():
        sys.exit(f"[ERROR] Input not found: {in_path}")

    print(f"[INFO] Loading: {in_path}")
    tensors = load_file(str(in_path))
    keys = list(tensors.keys())

    fmt = detect_format(keys)
    print(f"[INFO] Detected format: {fmt}")
    if fmt != "qwen_transformer_blocks_comfyui":
        sys.exit(
            "[ERROR] This rewrite currently targets Qwen-Image ComfyUI keys like:\n"
            "  transformer_blocks.N.attn.to_q.lora_down.weight\n"
            "If your keys differ, paste 30 keys and adjust detect_format/regex."
        )

    r, alpha = extract_rank_and_alpha(tensors)
    print(f"[INFO] Inferred r={r}, lora_alpha={alpha}")

    converted, unmapped = convert_qwen_transformer_blocks_comfyui(
        tensors=tensors,
        include_mlp=args.include_mlp,
        dtype=out_dtype,
    )

    print(f"[INFO] Converted tensors: {len(converted)}")
    if unmapped:
        print(f"[WARN] Unmapped keys: {len(unmapped)} (showing first 20)")
        for k in unmapped[:20]:
            print("   ", k)

    out_dir = Path(args.output)
    out_dir.mkdir(parents=True, exist_ok=True)

    cfg = dict(ADAPTER_CONFIG_TEMPLATE)
    cfg["r"] = int(r)
    cfg["lora_alpha"] = float(alpha)
    cfg["base_model_name_or_path"] = args.base_model
    cfg["target_modules"] = (
        QWEN_TARGET_MODULES_ATTN + (QWEN_TARGET_MODULES_MLP if args.include_mlp else [])
    )

    (out_dir / "adapter_config.json").write_text(json.dumps(cfg, indent=2), encoding="utf-8")
    save_file(converted, str(out_dir / "adapter_model.safetensors"))

    print(f"[DONE] Wrote PEFT adapter dir: {out_dir}")
    print("       - adapter_config.json")
    print("       - adapter_model.safetensors")


if __name__ == "__main__":
    main()

Usage (for Qwen-Image-Lightning)

python comfyui-to-vllm-omni-qwenimage.py \
  --input Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors \
  --output ./out_adapter \
  --dtype bf16 \
  --base-model Qwen/Qwen-Image

Why this matches Qwen-Image in vLLM-Omni

  • It writes LoRA keys under ...transformer... which aligns with Qwen-Image pipeline weight source prefix prefix="transformer." and self.transformer = QwenImageTransformer2DModel(...). (GitHub)
  • It keeps to_q/to_k/to_v and add_q_proj/add_k_proj/add_v_proj, which align with Qwen-Image transformer packed shard mapping (to_qkv shards and add_kv_proj shards). (GitHub)
  • It normalizes to_out.0 to to_out to avoid the known vLLM numeric-index LoRA failure. (GitHub)
  • It outputs the PEFT adapter folder vLLM-Omni requires. (vLLM)

Thank you for this script! I can confirm it indeed allows the lora to be ran in vllm-omni.

I did notice however that the size drops significantly (from 850 MB to 378 MB), is there some information loss when applied to the new Qwen-Image-2512-Lightning?

I’ve uploaded the output: OpenxAILabs/Qwen-Image-2512-Lightning-8steps-V1.0-bf16-PEFT · Hugging Face

Oh. The size drop may be because the conversion above does not include the MLP LoRA tensors:


LLM-generated notes / rough analysis:

I think the 850 MB -> 378 MB drop is probably explainable from the converter itself, and the most likely cause is not TextEncoder being skipped, but rather MLP LoRA tensors being skipped by default.

The relevant converter is this one:

OpenxAILabs/Qwen-Image-2512-Lightning-8steps-V1.0-bf16-PEFT

The script says:

# Attention-only by default (recommended). You can optionally include MLP keys with --include-mlp.
ALLOWED_QWEN_PREFIXES_ATTN = (
    "attn.to_q",
    "attn.to_k",
    "attn.to_v",
    "attn.to_out",
    "attn.add_q_proj",
    "attn.add_k_proj",
    "attn.add_v_proj",
    "attn.to_add_out",
)

# Optional MLP keys observed in Qwen-Image-Lightning (ComfyUI-style)
ALLOWED_QWEN_PREFIXES_MLP = (
    "img_mlp.net.0.proj",
    "img_mlp.net.2",
    "txt_mlp.net.0.proj",
    "txt_mlp.net.2",
)

And the actual filter is:

allowed_prefixes = ALLOWED_QWEN_PREFIXES_ATTN + (
    ALLOWED_QWEN_PREFIXES_MLP if include_mlp else ()
)

So, unless --include-mlp is passed, the converter keeps only the attention/projection LoRA tensors and drops:

img_mlp.net.0.proj
img_mlp.net.2
txt_mlp.net.0.proj
txt_mlp.net.2

This also matches the uploaded PEFT adapter’s adapter_config.json idea: the default target modules are attention/projection-ish modules, not MLP modules.

Relevant links:

Why the file size matches attention-only almost exactly

From the vLLM-Omni Qwen-Image transformer implementation, the default model shape is roughly:

num_layers = 60
num_attention_heads = 24
attention_head_dim = 128
inner_dim = 24 * 128 = 3072

The uploaded LoRA seems to be rank 64 / bf16. bf16 is 2 bytes per element.

For one LoRA linear projection with shape 3072 -> 3072 and rank 64:

lora_A: 64 x 3072
lora_B: 3072 x 64

elements = 64*3072 + 3072*64
         = 393,216

bytes = 393,216 * 2
      = 786,432 bytes
      = 0.75 MiB

The default converter keeps 8 attention projections per block:

attn.to_q
attn.to_k
attn.to_v
attn.to_out
attn.add_q_proj
attn.add_k_proj
attn.add_v_proj
attn.to_add_out

So the size estimate is:

0.75 MiB * 8 projections * 60 blocks = 360 MiB

In decimal MB:

360 MiB = 377.5 MB

That is almost exactly the reported converted size, 378 MB.

So I think the converted adapter size is not mysterious: it is basically the theoretical size of:

60 blocks * 8 attention LoRA projections * rank 64 * bf16

Why the original 850 MB also matches attention + MLP

The original file is listed as 850 MB here:

Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors

The missing difference is:

850 MB - 378 MB ~= 472 MB

That also matches the expected MLP LoRA size.

Qwen-Image blocks contain both image-stream and text-stream MLPs:

img_mlp
txt_mlp

The converter explicitly recognizes these MLP keys:

img_mlp.net.0.proj
img_mlp.net.2
txt_mlp.net.0.proj
txt_mlp.net.2

Assuming a usual MLP expansion of 4x, the MLP hidden size is approximately:

inner_dim * 4 = 3072 * 4 = 12288

For one MLP LoRA linear 3072 -> 12288 or 12288 -> 3072, rank 64:

elements = 64*3072 + 12288*64
         = 983,040

bytes = 983,040 * 2
      = 1,966,080 bytes
      = 1.875 MiB

There are 4 such MLP linears per block:

img_mlp.net.0.proj
img_mlp.net.2
txt_mlp.net.0.proj
txt_mlp.net.2

So:

1.875 MiB * 4 * 60 = 450 MiB

In decimal MB:

450 MiB = 471.9 MB

That is basically the whole missing part.

So the size arithmetic is:

attention LoRA only ~= 377.5 MB
MLP LoRA          ~= 471.9 MB
--------------------------------
total             ~= 849.4 MB

This is almost exactly the original 850 MB.

Therefore my rough conclusion is:

original 850 MB ~= attention LoRA + MLP LoRA
converted 378 MB ~= attention LoRA only

So is there information loss?

Probably yes, if the goal is to preserve the original LoRA exactly.

But it is a specific kind of information loss:

  • attention/projection LoRA is preserved
  • MLP LoRA is probably dropped
  • .alpha keys are skipped, but those are tiny and not the source of the size drop
  • TextEncoder is not needed to explain the size drop

I would not assume that this means the converted LoRA is useless. Attention-only LoRA can still have a strong effect, especially on rough prompt binding / layout / style direction. But for a Lightning/distillation LoRA, dropping the MLP part may reduce the low-step quality, details, texture, text rendering, and stability.

My guess:

simple prompts:      maybe fairly close
normal prompts:      likely usable, but weaker than full LoRA
complex text/layout: likely more visible degradation
4-step / 8-step edge cases: degradation likely more visible

Why TextEncoder is probably not the main explanation

TextEncoder skipping is possible in other LoRA conversion contexts, but here it is not necessary to explain the numbers.

The converter targets keys like:

transformer_blocks.N.<module>.lora_down.weight
transformer_blocks.N.<module>.lora_up.weight

It is not really written as a generic text_encoder / lora_te converter.

Also, the sizes line up too cleanly with:

attention-only = 378 MB
attention + MLP = 850 MB

So I would explain the size drop as MLP exclusion first, not TextEncoder exclusion.

Can we keep MLP?

Maybe. The script already has an option:

python comfyui-to-vllm-omni-qwenimage.py \
  --input Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors \
  --output ./out_adapter_with_mlp \
  --dtype bf16 \
  --base-model Qwen/Qwen-Image-2512 \
  --include-mlp

If this works as intended, I would expect adapter_model.safetensors to become close to 850 MB.

However, the converter itself warns that MLP can be tricky:

ap.add_argument(
    "--include-mlp",
    action="store_true",
    help="Also convert img_mlp/txt_mlp LoRA keys (may fail if vLLM expects different suffixes)",
)

The likely issue is not writing the tensors. Writing the tensors is easy. The issue is whether vLLM-Omni accepts and correctly applies the MLP module suffixes.

For example, the MLP targets include:

img_mlp.net.0.proj
img_mlp.net.2
txt_mlp.net.0.proj
txt_mlp.net.2

Their suffixes are roughly:

proj
2

proj is probably okay. The numeric suffix 2 may be the fragile part, because vLLM/vLLM-Omni LoRA validation can be strict about module suffixes. There is already a related vLLM issue for numeric-index module names such as to_out.0:

vLLM issue #35734: LoRA loading fails for modules with numeric indices

The current converter already works around the attention-side version of this by normalizing:

attn.to_out.0     -> attn.to_out
attn.to_add_out.0 -> attn.to_add_out

But net.2 is a different case. It may require the vLLM-Omni build to include "2" in expected LoRA modules, or it may need a more model-specific mapping.

Suggested sanity check

If anyone tries --include-mlp, I would check three things:

1. Size

ls -lh ./out_adapter_with_mlp/adapter_model.safetensors

Expected:

~850 MB

If it is still around 378 MB, MLP tensors were not included.

2. Key counts

from safetensors.torch import load_file

sd = load_file("./out_adapter_with_mlp/adapter_model.safetensors")

for needle in [
    "img_mlp.net.0.proj",
    "img_mlp.net.2",
    "txt_mlp.net.0.proj",
    "txt_mlp.net.2",
]:
    print(needle, sum(1 for k in sd if needle in k))

Expected rough count:

each MLP target: 60 blocks * 2 tensors = 120 keys

3. vLLM-Omni load log

The important question is whether vLLM-Omni reports that MLP modules were loaded and not silently ignored.

The vLLM-Omni LoRA docs require a PEFT-style adapter folder:

lora_adapter/
├── adapter_config.json
└── adapter_model.safetensors

Docs:

vLLM-Omni LoRA guide

If loading fails on net.2 / "2" / target module validation, then I think the clean solution would be either:

  1. patch the converter / adapter_config.json target modules, or
  2. patch vLLM-Omni’s diffusion LoRA mapper / supported modules for Qwen-Image MLP, or
  3. avoid runtime adapter loading and fuse the LoRA into the base model.

Practical recommendation

For runtime PEFT LoRA:

  1. Try the existing converter with --include-mlp.
  2. Confirm the output is around 850 MB.
  3. Confirm img_mlp / txt_mlp keys exist.
  4. Try loading in vLLM-Omni.
  5. If it fails, the likely blocker is target module suffix validation around net.2.

For maximum quality / minimum loader trouble:

  • fuse/merge the original LoRA into the Qwen-Image-2512 base weights using Diffusers or the reference loader
  • serve the fused model as a normal model in vLLM-Omni

That avoids the whole PEFT key validation problem, although it is no longer a runtime LoRA adapter.

TL;DR

I think the 378 MB file is probably an attention-only converted adapter.

The original 850 MB size is almost exactly:

attention LoRA ~= 378 MB
MLP LoRA       ~= 472 MB
total          ~= 850 MB

So the size drop is probably explained by the converter’s default behavior:

attention-only by default
MLP only if --include-mlp is passed

--include-mlp may preserve the missing tensors, but whether vLLM-Omni can load/apply img_mlp.net.2 and txt_mlp.net.2 correctly is the part that needs testing.