i had this public space /spaces/fuloos/FatayerHut
And the space auto restart without any new commits or change in configs even it was in mid of light work, so i’m sure not RAM issue
Can you tell me why this restarted is there any other reasons
i had this public space /spaces/fuloos/FatayerHut
And the space auto restart without any new commits or change in configs even it was in mid of light work, so i’m sure not RAM issue
Can you tell me why this restarted is there any other reasons
There are signs that a forced reboot occurred about one to two weeks ago. (I’ve seen reports other than the thread below, too.) This happened shortly after an incident where many Spaces went into PAUSED mode.
I can’t think of anything else off the top of my head…
If that happens consistently, I think it’s due to a different mechanism or cause.
like it’s happend many times and last time yasterday 11pm+4 GMT
@John6666 now it’s got restarted (about 2pm UTC) again no reason at all
it’s not sleep it’s just restart can anyone from support help with this?
Aside from the general points mentioned below, I recall seeing a report on a forum stating that free CPU spaces may reboot within 24 to 48 hours even if they haven’t reached their processing load or RAM limits. (I don’t remember exactly.)
Well, HF Free CPU Spaces isn’t really suited for practical backends that need to run continuously. It’s basically just for demos, after all…
Your Spaces are most likely restarting because they are running a stateful, multi-process Docker stack on a platform that is much more comfortable with simpler, faster-starting, mostly stateless containers. This is not the normal free-tier sleep behavior. A Space can restart without any new commit when the container exits, the app becomes unhealthy during startup, the runtime is recycled, or a platform-side control/runtime issue occurs. Hugging Face’s docs clearly distinguish lifecycle behavior, startup-health behavior, ephemeral local storage, and restart-triggering configuration changes. (huggingface.co, huggingface.co, huggingface.co, huggingface.co)
So the short answer is:
A Hugging Face Space is not a normal VPS. It is a managed container runtime with a web-facing app endpoint.
That matters because the platform comes with its own rules:
Hugging Face’s own deployment guides for heavier Docker apps such as Label Studio, Langfuse, ZenML, and Giskard all emphasize persistence and runtime structure. That is a signal in itself: once you move beyond a simple app server, the platform gets less forgiving. (huggingface.co, huggingface.co, huggingface.co, huggingface.co)
From the code structure, your Spaces are not just serving one app. They are doing all of this:
npm install,box2, continuously create DB snapshots in the background.That is much closer to a small self-hosted stack than to a normal Space app.
That design is workable on a full server you control. On a free managed Docker Space, it creates many more ways for the runtime to recycle or fail even when there was no commit and no obvious user-side traffic spike.
This is the most important conceptual point.
A new commit is only one reason a Space restarts. Other causes include:
If the process that Hugging Face considers the main app process exits, the container exits.
Hugging Face exposes startup_duration_timeout for a reason. The default is 30 minutes, and the app can still be treated as unhealthy if startup behavior is bad or incomplete. (huggingface.co)
If your app expects local DB/files to act like persistent state, a restart becomes much harder to recover from because the next boot has to do more work on ephemeral storage. Hugging Face’s storage docs explicitly say the disk is not persistent by default. (huggingface.co)
A Space can look “fine” from the outside while one internal service has already gone bad. That often leads to confusing behavior, partial failures, and later restarts.
There are public cases where restart/factory reboot itself broke, returned 503, or the runtime seemed wedged even across recreated Spaces. That means some failures really are platform-side. (discuss.huggingface.co, discuss.huggingface.co)
So “no commit happened” does not mean “HF should never have restarted this.”
Your runtime is doing heavy tasks at startup:
That is exactly the kind of startup path that becomes fragile on Spaces. Hugging Face’s config reference supports startup_duration_timeout and preload_from_hub because startup behavior is operationally important. (huggingface.co)
For a stable Space, startup should be as close as possible to:
Your stack is much heavier than that.
Your two Spaces have no persistent storage.
That means the database layer is sitting on storage Hugging Face describes as non-persistent. This is not just a convenience issue. It directly changes runtime behavior:
This is the single biggest architectural mismatch in your setup.
box2 adds a costly background snapshot loopAmong your two Spaces, box2 is riskier because it keeps generating DB snapshots in the background. That means even when user activity is “light,” the container is still doing nontrivial work.
That is exactly the sort of hidden workload that can make a free container less stable without obvious traffic pressure.
If important services mostly write to internal log files instead of stdout/stderr, the Space log view becomes much less useful. Hugging Face recently improved Spaces log access with programmatic log tools, but those tools are only as useful as the logs you emit. (github.com)
So part of the mystery may be that the most important evidence is not reaching the main log stream.
A multi-process container where Apache, Node, and MariaDB are all started by a shell script is inherently more fragile than a single-process app. If one child process dies, the Space may become partially broken before the container eventually restarts.
That makes the runtime harder to reason about.
This part matters because you emphasized it.
It is completely possible for a Space to restart while average RAM looks normal, because the trigger can be:
So I would not use “RAM looked fine” as a strong argument that the platform restarted a perfectly healthy app. It may have. But your runtime gives several stronger explanations first.
One detail that many people miss:
Hugging Face documents that outbound requests from Spaces are restricted to ports 80, 443, and 8080. (huggingface.co)
That matters because it makes “just use a normal external MySQL/Postgres server” less simple than it sounds. A direct connection on a typical DB port like 3306 may not work from a Space.
So your choices are not as open as they would be on a VPS:
That is an important part of the background here.
If I put everything together, this is my best reading:
Your Spaces are restarting because the runtime design is already too close to the edge:
There may have been one or more Hugging Face runtime/control-plane problems on top of that, because public reports show that those do happen. But those reports do not erase the fact that your current setup is inherently restart-sensitive. (discuss.huggingface.co, discuss.huggingface.co)
So the best honest answer is:
The platform may have nudged it, but your code/runtime made it much easier for that nudge to become a visible restart.
Do not do npm install at boot if you can avoid it.
Install dependencies in the image build. That reduces startup variability immediately.
This is one of the highest-value changes you can make.
Try to move from:
to:
If archive extraction must remain, then at least remove all the other avoidable boot work.
With no persistent storage, local DB state is not trustworthy across restarts. Officially, local disk is ephemeral. (huggingface.co)
That means you should choose one of these paths:
Make the app boot from a small seed and accept that user changes are not durable.
Add real persistence and redesign around it.
Right now you are between those two paths, which is the least stable place to be.
For box2, the snapshot loop should be dramatically less frequent, or removed until true persistence exists.
A background safety loop that constantly dumps/compresses the DB can become a destabilizer on free managed infrastructure.
The shell script should either:
The key goal is simple:
if a critical child dies, the container should fail cleanly and visibly, not drift into a half-working state.
This is a big practical fix.
Make Node, MariaDB, and Apache logs visible in the Space logs. Then the restart story becomes much easier to prove.
That also pairs well with the new Hugging Face log tooling. (github.com)
Add these in README.md:
app_port: 7860
startup_duration_timeout: 1h
This does not solve everything, but it removes ambiguity and gives a startup-heavy container more room. (huggingface.co)
Hugging Face provides Dev Mode and improved log access for Spaces. For a stack like yours, debugging only via the normal web UI is not enough. (huggingface.co, github.com)
Your Spaces are not restarting for “no reason.”
The reasons are most likely a combination of:
So my clearest advice is:
That is the most evidence-based explanation and the most realistic fix path for your case.
Thank you so much for this amazing detailed answer
This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.