It seems use 60 sec GPU quota instead of real time usage?

i don’t understand, the GPU bar one time is a 0 another at 1.2 and then 0.4 and can’t create nothing. try asked to AI but was futile e repeated thing i described it.

There’s bug or change not listed? found nothing

I only created 3 image, is the limit?

2 Likes

I only created 3 image, is the limit?

Yeah. If you are using a Free account and the Spaces duration is set to the default of 60 seconds, that is the correct limit. While the duration can be changed by the Spaces creator, many creators leave it at the default because setting it too short causes the entire generation process to fail, etc.

The reason why the count is based on reserved time rather than actual usage time is largely due to the mechanism of Zero GPU. (See below)

If you have accidentally incurred a financial loss and wish to request a refund, you must contact Hugging Face Support: billing@huggingface.co


Three images can be enough on HF ZeroGPU. The limit is not “number of images.” It is a time quota on shared GPU use, and the platform uses the function’s declared maximum runtime for scheduling, not only the exact wall-clock time you personally saw on screen. Hugging Face’s current docs say the default @spaces.GPU runtime is 60 seconds, a custom duration sets the maximum function runtime, and shorter durations improve queue priority. The same docs say free accounts get 3.5 minutes per day, unauthenticated users get 2 minutes, and PRO gets 25 minutes, with reset 24 hours after your first GPU usage. (Hugging Face)

The simplest explanation

ZeroGPU is a shared GPU pool. Hugging Face has to decide before your job starts whether your request can enter the queue fairly. That is why the docs talk about a maximum runtime and queue priority based on shorter durations. So the system is not thinking only in terms of “how long did this one image actually take after it finished?” It is also thinking “this request may occupy scarce GPU capacity for up to 60 seconds, or up to whatever duration the Space author set.” That is the background reason you keep seeing 60 seconds appear. (Hugging Face)

Why it feels wrong

It feels wrong because as a user you expect this:

  • image took maybe 8 seconds
  • so only 8 seconds should matter

But ZeroGPU behaves closer to this:

  • the Space asks for a GPU job with a budget
  • the default budget is 60 seconds unless the Space author lowered it
  • the scheduler checks whether that budget can be admitted
  • if your remaining quota is below that budget, the request can fail even if the real image might have finished faster

That reading is strongly supported by the current docs because Hugging Face explicitly says duration sets the maximum function runtime and that shorter durations improve queue priority. (Hugging Face)

What your bar values probably mean

Your numbers like 1.2 and 0.4 most likely represent minutes, not image count. This part is an inference, but it fits the published quotas very well:

  • 1.2 minutes72 seconds
  • 0.4 minutes24 seconds

If a Space is still using the default 60-second budget, then:

  • at 1.2 minutes left, a 60-second request could still fit
  • at 0.4 minutes left, a 60-second request would not fit

That exactly matches the kind of “I still have some bar left but it refuses to generate” behavior you described. The docs do not clearly document that bar UI in one place, but the quota numbers and 60-second default make this the best-fitting explanation. (Hugging Face)

Why only 3 images may already exhaust it

For a free account, the included ZeroGPU quota is 3.5 minutes total, which is 210 seconds. If the Space uses the default 60-second budget, then just 3 requests can already consume or reserve most of that budget:

  • 3 × 60s = 180 seconds
  • that leaves only 30 seconds
  • 30 seconds is only 0.5 minutes

So after 3 images, a bar like 0.4 or 0.5 left is completely plausible. This is especially true if the Space author did not lower duration for short jobs. (Hugging Face)

And it can get worse. If the Space requests size="xlarge", Hugging Face says it consumes 2× more daily quota than the default large. Their own example says a 45-second effective task duration on xlarge consumes 90 seconds of quota. In that kind of Space, only a few generations can burn through a free-tier day. (Hugging Face)

So is it “real time usage” or not?

The clean answer is: both matter, but in different ways. Hugging Face’s docs say the GPU is requested when the function is called and released when the function completes, which means the GPU is not supposed to stay occupied for the full 60 seconds if the work ends early. But the docs also say duration sets the maximum runtime and affects queue priority, which means the platform clearly uses that budget before execution for scheduling. (Hugging Face)

So the best mental model is:

  • actual runtime matters for the real work and GPU release
  • declared duration matters for queue admission and quota handling

That is why ZeroGPU can look like it is using “60 seconds instead of real time,” even though the real issue is that the system is built as a shared scheduler, not just a stopwatch. (Hugging Face)

Is there a hidden change on the site?

I did not find a current official Hugging Face doc announcing a recent change from “real runtime” to “duration-based quota.” The current docs still describe the same model: default 60 seconds, custom maximum runtime, dynamic duration support, quota tiers, and 24-hour reset. (Hugging Face)

So for your specific question, the most likely answer is not “a secret new rule.” The most likely answer is that this has been the design, but the UI makes it hard to understand. A strong sign of that confusion is that developers asked Hugging Face for a way to retrieve exact remaining ZeroGPU quota seconds, and that request was closed as not planned. That means observability is still weak. (GitHub)

Could it still be a bug?

Yes. But only in some cases.

There are real recent reports of wrong quota identity being applied. One Gradio issue from April 7, 2026 says custom frontends on HF Spaces with ZeroGPU could treat logged-in PRO users as unauthenticated, giving them only the smaller free-tier quota because the x-ip-token handshake was missing. Gradio’s changelog also includes fixes for ZeroGPU headers forwarding and updates to the ZeroGPU guide about manually passing an IP token. HF’s API docs likewise say authenticated requests consume your account quota, while unauthenticated requests use a stricter shared pool. (GitHub)

So:

  • if this happens in one or two Spaces only, it may be those Spaces’ setup
  • if it happens across many ZeroGPU Spaces, especially if you are logged in or PRO and still see nonsense like 0s left, then a bug or auth/identity mismatch is plausible (GitHub)

About the bar jumping from 0 to 1.2 to 0.4

That kind of jump is believable without any hidden site change. A few reasons:

  • quota is small on free tier: 3.5 minutes
  • different Spaces can request different durations
  • remaining quota affects queue behavior
  • the UI does not expose quota state very transparently
  • some auth/header issues can make a request count under the wrong quota pool (Hugging Face)

So the jumpy display does not by itself prove a new bug. It can happen from the combination of tiny quota, request budgets, and poor visibility.

Your “I only created 3 image” question

Yes, 3 images can be the limit. On free tier, very easily. If the Space uses the default 60-second budget, three generations already reach 180 seconds out of 210 seconds total free daily quota. If the Space is heavy, uses xlarge, or has longer custom duration, three images can be enough even faster. (Hugging Face)

The plain-language conclusion

HF ZeroGPU uses 60 seconds by default because it is running a shared GPU queue and needs a maximum runtime budget in advance to decide whether your job can be scheduled fairly. That is why it does not behave like a simple “count exact seconds after the image finishes” system. The confusing bar you see is most likely showing remaining time quota, probably in minutes, not remaining image count. So three images can absolutely be enough on the free tier, especially when the Space still uses the default 60-second duration. The current docs support this design, and I did not find an official announcement of a recent hidden rule change. Real bugs do exist, but they are more likely when quota looks wrong across many Spaces or when logged-in users are being treated as unauthenticated. (Hugging Face)

ty for the exhaustive explanation.

it’s simply because 2 day ago i remember to have generated at least 15(?) (sry don’t remember exactly) , that was enough for some testing, with 3 i need 1 week for decent result every time.

maybe they changed the one i was trying using? (SDXL Text To Image - a Hugging Face Space by wifi-lover)

1 Like

maybe they changed the one i was trying using? (SDXL Text To Image - a Hugging Face Space by wifi-lover)

It looks like the author last updated that Space in February, but I think there’s a possibility that HF made some changes or something like that.:thinking:

Well, anyway, in terms of how Zero GPU works, the current behavior is actually quite normal.
So, for Spaces where the actual duration is short and can be predicted fairly accurately, setting the duration to a shorter value allows you to use it more times. Also, some creators leave it up to user settings and make it adjustable. (This officially became possible a few months ago. It was possible through workarounds even before that…)

In any case, to intentionally change the duration, either the Space creator needs to do so, or you’ll need to subscribe to Pro yourself, duplicate the Space for your own use, and modify it. I think quite a few people are doing the latter.

I think was the 8th april morning when i was generating without problem and then afternoon started to behave i that manner.

What mean duplicate? modify?

1 Like

It did the same for me, at the same moment. Only 3-4 generations and it doesn’t work anymore. I’ve been using the same generator for a few months already and was able to generate like 15-20 images before. It has to be some kind of bug or a secret update of huggingface.

1 Like

What mean duplicate? modify?

Like this way.

duplicate: Quota exceed error - #10 by John6666
modify: Need help duplicating Space from Space Zero GPU to paid hardware

I created a guide (click here for the detailed version):


The reliable way is to treat this as two separate jobs:

  1. Duplicate the Space correctly
  2. Recreate the runtime conditions that made the original work

That matters because a duplicated Hugging Face Space is private by default, falls back to free CPU by default unless you choose other hardware, copies public Variables but not Secrets, and ZeroGPU itself is a separate runtime with its own rules. (Hugging Face)

What ZeroGPU is, before you duplicate

ZeroGPU is not “normal GPU Spaces, but free.” It is a shared ZeroGPU runtime for Gradio Spaces only, backed by NVIDIA H200 capacity, with a default 60-second GPU duration per @spaces.GPU call unless the app sets another duration. Hugging Face also notes that ZeroGPU can have limited compatibility compared with standard GPU Spaces, even though it supports Gradio 4+ and a wide range of PyTorch versions. (Hugging Face)

That is why a duplicated repo can be correct as code, but still fail as an app. The source Space may depend on the same SDK, the same hardware, the same secrets, the same startup behavior, and the same request path assumptions. (Hugging Face)

The safest way to duplicate a ZeroGPU Space to a private ZeroGPU Space

1. Inspect the source Space first

Before clicking Duplicate this Space, check:

  • whether it is really a Gradio Space
  • whether the README/YAML pins sdk_version or python_version
  • whether it pulls gated or private models
  • whether it downloads large files at startup
  • whether it clearly expects ZeroGPU behavior rather than ordinary GPU behavior

Those checks matter because the README YAML controls important runtime settings like python_version, sdk_version, startup_duration_timeout, and preload_from_hub. (Hugging Face)

2. Duplicate it as faithfully as possible

For the first duplicate, keep it conservative:

  • set Visibility = Private
  • keep the same SDK as the source
  • if the source is ZeroGPU, choose ZeroGPU
  • do not switch to CPU or paid GPU yet unless you are intentionally migrating

This matters because Hugging Face says duplicated Spaces default to free CPU hardware unless you choose otherwise. That is one of the easiest ways to create a “works there, broken here” duplicate. (Hugging Face)

3. Recreate secrets immediately

This is the most common duplication miss.

Hugging Face documents that Variables can be auto-copied into duplicates, but Secrets are not copied. So if the original uses HF_TOKEN, third-party API keys, OAuth credentials, or anything private, you need to add them again in the duplicate’s Settings. (Hugging Face)

If the app needs access to a private or gated model, dataset, or other repo, use a Hugging Face access token with the needed permissions. Hugging Face documents User Access Tokens as the normal authentication method, and a read token is enough for read-only access to private repos you can read. (Hugging Face)

4. First test from the standard HF Space page

For the first validation, open the duplicate from the normal Hugging Face Space page while logged in. Do not start with:

  • direct *.hf.space URLs
  • embeds
  • custom frontends
  • API clients

Gradio documents that ZeroGPU request accounting uses the X-IP-Token header. If that identity path is missing, the request may be treated as unauthenticated, which can make a perfectly fine duplicate look broken or quota-limited. (Gradio)

5. Run the smallest realistic test first

Do not start with the heaviest prompt, the biggest image, or the longest job. Use the smallest input that should still succeed.

That matters because ZeroGPU defaults to a 60-second duration budget, and Hugging Face explicitly says shorter durations improve queue priority. It also explains why quota can feel “chunky” instead of matching only the wall-clock time you noticed. (Hugging Face)

The fast troubleshooting rule

After duplication, do not debug randomly. First classify the failure:

  • stuck on Building
  • Running, but behaves differently
  • quota/auth looks wrong
  • browser works, API fails
  • ZeroGPU/CUDA error

That classification-first approach is the fastest path because each bucket has a different likely cause and different next move. (Hugging Face Forums)

How to fix errors after duplication

A. Stuck on Building

Building is not one single failure. The current Hugging Face forum guidance breaks it into multiple layers: repo/YAML read, build, scheduling, provisioning, then app health. (Hugging Face Forums)

Use this order:

  1. Check the Hugging Face status page and recent reports.
  2. If the platform looks healthy, try Restart once.
  3. Then try Factory rebuild once.
  4. Only then inspect dependencies and startup config.

That order matches the forum guidance for recent Building failures. (Hugging Face Forums)

If logs are empty or show only queue-like behavior, suspect platform or scheduler state first, not your app code. If build logs show dependency failures, suspect dependency drift first. A fresh duplicate rebuilds now, under current conditions, which may differ from the environment that the source Space originally built under. (Hugging Face Forums)

If build finishes but the Space never becomes healthy, check README/YAML settings such as:

  • startup_duration_timeout
  • preload_from_hub

Hugging Face says startup_duration_timeout defaults to 30 minutes, and preload_from_hub shifts large Hub downloads into build time so startup is faster and less fragile. (Hugging Face)

B. Running, but not like the original

When the duplicate reaches Running but behaves differently, the usual cause is environment drift, not a broken duplicate button.

A recent Hugging Face forum case showed a duplicate on the same ZeroGPU class behaving differently from the original until dependencies were pinned more tightly. (Hugging Face Forums)

Check in this order:

  • smallest possible input
  • one Factory rebuild
  • requirements.txt
  • sdk_version
  • python_version
  • whether hardware really matches
  • whether a secret or access token is missing

This is the right place to be suspicious of version drift. “Same repo” does not guarantee “same resolved environment.” (Hugging Face Forums)

C. Quota exceeded, PRO ignored, or quota looks wrong

Do not assume this is real quota exhaustion.

Gradio documents that ZeroGPU uses X-IP-Token for request identity, and there is also a recent GitHub issue showing that custom gr.Server frontends can miss the handshake and cause logged-in PRO users to be treated like unauthenticated users. (Gradio)

Use this order:

  1. test from the normal HF Space page while logged in
  2. avoid direct *.hf.space links at first
  3. avoid custom frontends at first
  4. check whether the Space is on an old Gradio version

That last point matters because a Hugging Face forum reply specifically says a broader quota-related bug was resolved in Gradio 5.12.0 or newer. (Hugging Face Forums)

For background, current Hugging Face docs say ZeroGPU daily quota is 2 minutes for unauthenticated users, 3.5 minutes for free accounts, 25 minutes for PRO, and resets 24 hours after first GPU usage. (Hugging Face)

D. Browser works, API fails

If the private duplicate works in the browser but API calls fail, suspect auth first.

Hugging Face documents that every Gradio Space can be used as an API endpoint, and the standard programmatic path is the Gradio client with a token. (Hugging Face)

For a private duplicate:

  • confirm the Space works in the browser first
  • then test with an authenticated token
  • use a Hugging Face access token with the needed read access for private resources

That separates “the app is broken” from “the app is fine, but your API request is not authorized.” (Hugging Face)

For ZeroGPU, there is a second layer: API access and ZeroGPU request identity are related but not identical. You can be authorized to access the private Space and still have a bad X-IP-Token path for ZeroGPU accounting. (Hugging Face)

E. CUDA has been initialized before importing the spaces package

This is a classic ZeroGPU-specific error.

The usual meaning is: something touched CUDA too early, before ZeroGPU could manage GPU allocation the way it expects. Hugging Face’s ZeroGPU docs say the intended pattern is:

  • select ZeroGPU hardware
  • import spaces
  • put GPU work behind @spaces.GPU

A forum thread with that exact error confirms this pattern in practice. (Hugging Face)

What to check:

  • torch.cuda.is_available() at import time
  • model.to("cuda") too early
  • any CUDA-touching library side effects before import spaces

The fix is to move GPU work into the ZeroGPU-managed path instead of letting CUDA initialize too early. (Hugging Face)

F. No CUDA GPUs are available

This error can be either:

  • a transient ZeroGPU/platform problem
  • an app/runtime mismatch

A Hugging Face forum thread shows this exact error on ZeroGPU, and a follow-up reply reported that a retry/replication later worked again, which suggests at least some cases are transient. (Hugging Face Forums)

Use this order:

  1. retry once
  2. restart once
  3. if it clears, do not rewrite code yet
  4. if it persists only in your duplicate, inspect CUDA timing and dependency pins

That keeps you from wasting time on a transient platform issue. (Hugging Face Forums)

G. ZeroGPU worker error RuntimeError

Treat this as a symptom bucket, not a diagnosis.

Forum reports show that this class of error can be caused by broader platform issues, by temporary ZeroGPU instability, or by app-specific dependency problems. (Hugging Face Forums)

Use this order:

  1. retry once
  2. restart once
  3. see whether many unrelated ZeroGPU Spaces are failing too
  4. if only your duplicate fails, inspect versions and rebuild state

If many Spaces fail at the same time, suspect platform conditions. If only your duplicate fails, suspect runtime drift first. (Hugging Face Forums)

H. ZeroGPU illegal duration or “requested GPU duration is larger than the maximum allowed”

This usually means the app requested an unrealistic GPU duration, not that duplication failed.

Hugging Face documents the default duration as 60 seconds and shows custom duration examples like @spaces.GPU(duration=120). A forum thread shows “300s” triggering the illegal-duration error. (Hugging Face)

What to do:

  • find @spaces.GPU(duration=...)
  • lower it
  • retest with a smaller workload
  • keep GPU sections narrow and only as long as needed

Also note that xlarge consumes the daily quota of large, so using a bigger ZeroGPU size can make quota pressure worse, not better. (Hugging Face)

When to move to paid GPU

Move to paid GPU after you get one clean minimal success on a faithful private ZeroGPU duplicate.

That is the safest point to migrate because then you know the code, secrets, and startup path are basically correct. The migration is no longer mixed up with duplication mistakes. The recent forum thread about “ZeroGPU to paid hardware” is really a migration problem, not a plain duplication problem. (Hugging Face Forums)

The short version

Use this order:

  1. confirm the source is a Gradio ZeroGPU Space
  2. duplicate it as Private + same SDK + same ZeroGPU class
  3. recreate Secrets and any needed HF_TOKEN
  4. test from the standard HF Space page while logged in
  5. run the smallest input first
  6. classify the first failure instead of changing many things at once
  7. only after one success, optimize startup or migrate to paid GPU

That is the cleanest beginner-safe workflow because it separates repo duplication from runtime reconstruction. (Hugging Face)

The most useful references to keep open while doing this are the official Spaces Overview, ZeroGPU, Spaces Configuration Reference, Spaces as API endpoints, the Gradio Using ZeroGPU Spaces with the Clients guide, and the recent Hugging Face forum threads on Building, quota, and duplicate behaves differently. (Hugging Face)

So, let’s see if I understood correctly: at the beginning of April, a new limit was imposed for users on the free plan, which led to a significant decrease in activity on Spaces. Would this explain why I could previously create up to 40, and now, hopefully, only 10? Furthermore, the timer that alerts me to the reset is not working: it tells me that the time remaining to generate new images is 00:00:00. Is this an issue already known to anyone? If so, could you explain? Thanks.

1 Like

Hmm… In that case, it’s likely due to the changes described in the post below.