I can't use the service anymore

Maybe you’ve just used up your monthly allowance ($2.00)… It will be replenished after one month.


Guide: Fixing Hugging Face’s “You have depleted your monthly included credits” error when you already have PRO

The short version

A Hugging Face PRO subscription does not include unlimited Inference Providers usage. The current Hugging Face docs say PRO includes $2.00 per month in Inference Providers credits, and compute usage is billed separately from the PRO subscription itself. That means you can be a valid PRO subscriber and still get blocked once the included routed-inference credits are gone. (Hugging Face)

What this error usually means

When you see:

Failed to perform inference: You have depleted your monthly included credits. Purchase pre-paid credits to continue using Inference Providers.

the platform is usually telling you something about the Inference Providers billing layer, not about whether your PRO plan exists. In Hugging Face’s current billing model, routed requests through Inference Providers use a monthly credit pool first, then require additional paid usage after that. (Hugging Face)

Why this is confusing

Hugging Face now uses Inference Providers as the unified routed-inference system, and the docs note that hf-inference used to be called “Inference API (serverless)”. That transition makes older expectations misleading. Many users still think “I have PRO” should mean “my API calls should keep working,” but the current model is really subscription + included credits + separate compute billing. (Hugging Face)

The most common root causes

1. You really did use up the included PRO credits

This is the most common explanation. Hugging Face’s pricing docs currently say Free users get $0.10/month, PRO users get $2.00/month, and Team or Enterprise organizations get $2.00 per seat per month for Inference Providers. After that, continued usage requires additional purchased credits. (Hugging Face)

A key detail: your credits may have been consumed by more than just your own API code. Hugging Face’s docs say model-page widgets, the Inference Playground, and Data Studio AI also use Inference Providers and count against the same monthly credits. (Hugging Face)

2. Your PRO account is active, but your account is not ready for paid continuation

Hugging Face’s billing docs say compute services are billed separately from PRO, and the only supported payment method for compute services is credit cards. Public Hugging Face support replies also say this same 402-style failure often happens when there is no payment method on the account. (Hugging Face)

So the hidden problem may be:

  • PRO is active
  • the included credits are exhausted
  • but Hugging Face cannot continue charging usage because the compute-billing path is not set up correctly. (Hugging Face)

3. Your token is wrong, stale, or missing the required permission

Hugging Face’s Inference Providers docs say you should use a fine-grained token with “Make calls to Inference Providers” permission. Their InferenceClient docs also say that if you do not explicitly pass a token, the client will default to the locally saved token. (Hugging Face)

That creates a common failure mode:

  • you generated a new token
  • but your code is still using an older token saved on your machine
  • or the active token does not have the Inference Providers permission enabled. (Hugging Face)

Public Hugging Face forum replies support this too. In similar 402 reports, staff and experienced users explicitly pointed to missing payment methods and wrong token permissions as common causes. (Hugging Face Forums)

4. Your requests are being billed to the wrong account

If you belong to a Team or Enterprise organization, this matters a lot. Hugging Face’s pricing docs say requests are billed to the user account by default, and org billing only applies if you explicitly set the billing target, such as with bill_to="my-org-name" or the X-HF-Bill-To header. (Hugging Face)

So a person can be part of a paid organization, use a valid token, and still hit the personal monthly limit because the request is not actually being billed to the organization. Public reports show this is a real pattern. (Hugging Face)


Step-by-step fix

Step 1: Check whether the credits are actually gone

Open your Inference Providers usage page and your Billing page. Hugging Face says the usage view shows the past month’s usage broken down by model and provider. That is the fastest way to confirm whether this is true credit exhaustion or a lookalike configuration problem. (Hugging Face)

What to look for

  • If usage is clearly nontrivial and the credits are spent, the message is probably accurate. (Hugging Face)
  • If usage looks very low or inconsistent, move to billing and token checks. The same error can appear in those cases too. (Hugging Face Forums)

Step 2: Verify billing is set up for compute usage

Go to Settings → Billing and confirm your account has a valid credit card and is ready for compute billing. Hugging Face’s billing docs state that compute services are usage-based, separate from PRO, and credit cards are the supported payment method for compute services. (Hugging Face)

Why this matters

Your PRO renewal and your inference spend are not the same billing stream. A paid PRO badge does not automatically prove that your account is ready to continue beyond the included monthly credits. (Hugging Face)

Step 3: Create a fresh token with the correct permission

Create a new fine-grained token and enable Make calls to Inference Providers. Hugging Face explicitly documents that this permission is required for Inference Providers requests. (Hugging Face)

Then replace the token everywhere:

  • shell environment variables
  • notebook secrets
  • .env files
  • CI secrets
  • local cached login state. (Hugging Face)

Step 4: Retry with the token passed explicitly

Do not rely on the client’s default token behavior during debugging. Hugging Face’s InferenceClient docs state that if you do not pass a token, it will use the locally saved token by default. (Hugging Face)

A clean Python test looks like this:

from huggingface_hub import InferenceClient

client = InferenceClient(
    token="hf_your_new_token_here"
)

resp = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=[{"role": "user", "content": "Reply with the word OK only."}],
)

print(resp.choices[0].message)

That example is using the documented InferenceClient flow and the recommended explicit token pattern. (Hugging Face)

Step 5: If you use an organization, explicitly bill the org

If you are supposed to use Team or Enterprise credits, set the billing target explicitly. Hugging Face’s docs show bill_to="my-org-name" for the Python client and X-HF-Bill-To: my-org-name for HTTP requests. (Hugging Face)

Python example:

from huggingface_hub import InferenceClient

client = InferenceClient(
    token="hf_your_token_here",
    bill_to="my-org-name"
)

resp = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=[{"role": "user", "content": "Reply with the word OK only."}],
)

print(resp.choices[0].message)

Raw HTTP example:

curl https://huggingface.co/proxy/router.huggingface.co/v1/chat/completions \
  -H "Authorization: Bearer hf_your_token_here" \
  -H "X-HF-Bill-To: my-org-name" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V3-0324",
    "messages": [{"role": "user", "content": "Reply with the word OK only."}]
  }'

Those patterns match Hugging Face’s current organization-billing documentation. (Hugging Face)

Step 6: If you need continued usage, choose the right billing path

Hugging Face documents two different ways to run inference:

  1. Routed by Hugging Face
    Your request is routed through Hugging Face. Monthly credits apply. Extra usage is billed on your HF account. (Hugging Face)

  2. Custom provider key
    You supply your own provider key. Hugging Face routes the request, but the provider bills you directly. HF monthly credits do not apply. (Hugging Face)

When to stay with Hugging Face billing

Stay with HF billing if you want:

  • one bill
  • easy provider switching
  • to use the included monthly credits first. (Hugging Face)

When to switch to your own provider key

Switch if you:

  • already have an account with a provider
  • want more direct billing control
  • do not want the HF included-credit pool to be the limiting factor. (Hugging Face)

Hugging Face says you can set a custom provider key in Hub settings or in InferenceClient, while keeping the same integration surface. (Hugging Face)

Step 7: Consider avoiding HF-routed inference entirely

If your real goal is “I want inference to work without this credits system,” Hugging Face’s own inference guide says the client can also connect to local endpoints, including llama.cpp, Ollama, vLLM, LiteLLM, and TGI. That shifts you away from HF-routed Inference Providers billing. (Hugging Face)

This is often the cleanest long-term fix for users who want predictable local control rather than monthly hosted credits. (Hugging Face)


A practical decision tree

Case A: You only use the Hugging Face website

Follow this order:

  1. Check Inference Providers usage. (Hugging Face)
  2. Check Billing and confirm a valid compute payment method exists. (Hugging Face)
  3. Remember that widgets, Playground, and Data Studio AI also spend the same credits. (Hugging Face)
  4. If credits are gone, either purchase more capacity or switch to a custom provider key. (Hugging Face)

Case B: You use Python, JavaScript, LangChain, notebooks, or an OpenAI-compatible client

Follow this order:

  1. Generate a new fine-grained token with Make calls to Inference Providers. (Hugging Face)
  2. Pass the token explicitly. Do not trust the cached local token. (Hugging Face)
  3. If you use an org, add bill_to or X-HF-Bill-To. (Hugging Face)
  4. If you still need more usage, choose between HF billing and a custom provider key. (Hugging Face)

What I think is most likely for your case

The most probable explanation is:

  • your PRO subscription is active
  • but your included Inference Providers credits are exhausted
  • and either pay-as-you-go compute billing is not fully usable yet, or your token / billing target is wrong. (Hugging Face)

That diagnosis fits the official Hugging Face documentation and also matches the most common public support patterns around this exact error family. (Hugging Face)


When to contact support

Contact Hugging Face billing support if all of the following are true:

  • your PRO subscription is active
  • the billing page shows a valid payment method
  • you created a fresh token with the correct permission
  • you passed the token explicitly
  • you set org billing correctly if relevant
  • and the error still persists. (Hugging Face)

Hugging Face’s billing docs explicitly direct billing-related support requests to billing@huggingface.co. (Hugging Face)


Final takeaway

This error usually does not mean “your PRO subscription stopped working.” It usually means Inference Providers credits or billing are the problem. The cleanest recovery path is:

  1. check usage
  2. check compute billing
  3. replace the token
  4. pass the token explicitly
  5. set org billing if needed
  6. switch to a custom provider key or local endpoint if hosted HF credits are not the right fit. (Hugging Face)