with torch.no_grad(): output_ids = model.generate( input_ids=input_ids, images=image_tensor, max_new_tokens=256, do_sample=True ) leads to attribute error none type object has no shape
print(“Prompt:”, prompt) print(“Type of input_ids:”, input_ids.dtype) print(“Shape of input_ids before generate:”, input_ids.shape) print(“Shape of image_tensor:”, image_tensor.shape) print(“Type of image_tensor:”, image_tensor.dtype) all print statement giving correct.could anyone suggest what leads to error.am doing inference check after finetuned multimodal model
You’re probably passing images=image_tensor into a model that doesn’t expect or properly handle that argument. Inside the generate() method, HuggingFace or custom model logic might be expecting the images argument to be processed by a prepare_inputs_for_generation() or forward() call — and if not correctly implemented, it returns None, causing the error when .shape is accessed.
Side possibility, does the model support multimodality?
For example, a typical AutoModelForCausalLM does not support an images parameter.
Try this
output_ids = model.generate(input_ids=input_ids, max_new_tokens=256, do_sample=True)
If this works without error, your model or its generation path is not correctly set up to handle multimodal input.
Leave a like if this helped you at all 
Fix:
You are passing images=image_tensor to model.generate, but your model or generation config likely does not support the images argument—or your finetuned model doesn’t handle multimodal input as expected.
Direct script correction:
Try this: Remove images if the model doesn’t support multimodal inference
output_ids = model.generate(
input_ids=input_ids,
max_new_tokens=256,
do_sample=True
)
If your model does support images, make sure image_tensor is not None and is properly preprocessed. Otherwise, the error means image_tensor is None when accessed inside the generate function.
Check:
assert image_tensor is not None, “image_tensor is None”
Solution provided by Triskel Data Deterministic AI.
liuhaotian/llava-v1.5-7b this is my base model i have fine tuned and pushed in huggingface.am giving input both image and text for inference check.please guide me
If you want to use images, it seems that you need to pass the pixel_values argument instead of the images argument for the LLaVa model.
https://stackoverflow.com/questions/1109422/getting-list-of-pixel-values-from-pil