internvl3-2b-walk-lora-v1

Model Description

This is a LoRA adapter for InternV3-2B, fine-tuned on the WalkVLM dataset to assist visually impaired individuals with navigation hazard detection.

How to Use

Method 1: Using PEFT (Recommended)

import torch
from peft import PeftModel
from transformers import AutoModel, AutoTokenizer

# Load Base Model
base_model = AutoModel.from_pretrained(
    "OpenGVLab/InternVL3-2B",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("OpenGVLab/InternVL3-2B", trust_remote_code=True)

# Load LoRA Adapter
model = PeftModel.from_pretrained(base_model, "blind-assist/internvl3-2b-walk-lora-v1")

# Merge for faster inference (optional)
model = model.merge_and_unload()

# Use for inference
response = model.chat(
    tokenizer=tokenizer,
    pixel_values=pixel_values,  # Your preprocessed image
    question="Describe any obstacles in this scene.",
    generation_config=dict(max_new_tokens=256)
)

Method 2: Manual LoRA Merge

If PEFT doesn't work due to model architecture, use manual merging:

# See our inference script at:
# https://github.com/Blind-Assist/InternVL/blob/walkvlm/internvl_chat/test_finetuned_model.py

Training Details

Base Model: OpenGVLab/InternVL3-2B
Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 128
Dataset: blind-assist/walk-train
Task: Navigation hazard detection for visually impaired users

Files

adapter_config.json - PEFT LoRA configuration
adapter_model.safetensors - LoRA weights only (~50MB)

License

Same as base model (OpenGVLab/InternVL3-2B)

Downloads last month: 1

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for blind-assist/internvl3-2b-walk-lora-v1

Base model

OpenGVLab/InternVL3-2B-Pretrained

Finetuned

OpenGVLab/InternVL3-2B-Instruct

Finetuned

OpenGVLab/InternVL3-2B

Adapter

(3)

this model

blind-assist
/

internvl3-2b-walk-lora-v1