How can I get the logits from an endpoint call?

I’m attempting to do a query similar to this one using the Huggingface inference endpoints.

api_url = 'https://huggingface.co/proxy/api-inference.huggingface.co/models/meta-llama/Meta-Llama-3.1-70B-Instruct'
headers = {'Authorization': f'Bearer {token}'}
response = requests.post(api_url, headers = headers, json = {'inputs': 'What is the capital of France? The capital of France is : ')

I’m not just looking for the answer, but also for the logits of the generated search: I want to be able to calculate the probability of getting a certain answer.

I can do this with AutoModelForCausalInference, but most big models don’t fit my GPUs (and a HF Pro subscription is cheaper than another A100).

Is there any way to use the API this way?

Hi,

According to Detailed parameters, the serverless inference API does not support returning logits.

If you want that, you could define a custom handler on Inference Endpoints which also returns logits besides text.

Just checking: can I use the API calls in inference endpoints with logits as part of an experiment I’m doing in my local computer, or do I have to use gradio, Spaces, or some other library like that?

+1 I’d be interested in an easy/standard way to do this as well.