GUing: A Mobile GUI Search Engine using a Vision-Language Model
Paper • 2405.00145 • Published
How to use Jl-wei/uiclip-vit-base-patch32 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("zero-shot-image-classification", model="Jl-wei/uiclip-vit-base-patch32")
pipe(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png",
candidate_labels=["animals", "humans", "landscape"],
) # Load model directly
from transformers import AutoProcessor, AutoModelForZeroShotImageClassification
processor = AutoProcessor.from_pretrained("Jl-wei/uiclip-vit-base-patch32")
model = AutoModelForZeroShotImageClassification.from_pretrained("Jl-wei/uiclip-vit-base-patch32")The UIClip model has been renamed to GUIClip. You can now access the updated model at https://huggingface.co/Jl-wei/guiclip-vit-base-patch32.
UIClipGUIClip is a vision-language model in GUI domain.
Code and dataset can be found at https://github.com/Jl-wei/guing
If you find our work useful, please cite our paper:
@misc{wei2024guing,
title={GUing: A Mobile GUI Search Engine using a Vision-Language Model},
author={Jialiang Wei and Anne-Lise Courbis and Thomas Lambolais and Binbin Xu and Pierre Louis Bernard and Gérard Dray and Walid Maalej},
year={2024},
eprint={2405.00145},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
Please note that the model can only be used for academic purpose.
Base model
openai/clip-vit-base-patch32