I’ve used OpenAI GPT-4 for data extraction, but since it’s a general-purpose commercial model, it’s not specifically fine-tuned for data extraction tasks. I believe GPT-4 may not perform as well as models fine-tuned exclusively for this purpose. Therefore, I’m looking for open-source LLMs that are specifically trained for data extraction and offer high accuracy and efficiency. Could you recommend any models that fit these criteria?
I’m totally ignorant about LLMs for specific applications, but why don’t you actually try and get a feel for which language model to use as a base?
In general at HF, it would be quickest if you could find a space that uses LLM for a similar use case and see the source code there or ask the author a question.