Space Auto restart

i had this public space /spaces/fuloos/FatayerHut
And the space auto restart without any new commits or change in configs even it was in mid of light work, so i’m sure not RAM issue

Can you tell me why this restarted is there any other reasons

1 Like

There are signs that a forced reboot occurred about one to two weeks ago. (I’ve seen reports other than the thread below, too.) This happened shortly after an incident where many Spaces went into PAUSED mode.
I can’t think of anything else off the top of my head…

If that happens consistently, I think it’s due to a different mechanism or cause.

like it’s happend many times and last time yasterday 11pm+4 GMT

1 Like

@John6666 now it’s got restarted (about 2pm UTC) again no reason at all

it’s not sleep it’s just restart can anyone from support help with this?

1 Like

Aside from the general points mentioned below, I recall seeing a report on a forum stating that free CPU spaces may reboot within 24 to 48 hours even if they haven’t reached their processing load or RAM limits. (I don’t remember exactly.)

Well, HF Free CPU Spaces isn’t really suited for practical backends that need to run continuously. It’s basically just for demos, after all…


Bottom line

Your Spaces are most likely restarting because they are running a stateful, multi-process Docker stack on a platform that is much more comfortable with simpler, faster-starting, mostly stateless containers. This is not the normal free-tier sleep behavior. A Space can restart without any new commit when the container exits, the app becomes unhealthy during startup, the runtime is recycled, or a platform-side control/runtime issue occurs. Hugging Face’s docs clearly distinguish lifecycle behavior, startup-health behavior, ephemeral local storage, and restart-triggering configuration changes. (huggingface.co, huggingface.co, huggingface.co, huggingface.co)

So the short answer is:

  • No, this does not look like ordinary sleep.
  • Yes, it can happen without new commits.
  • Yes, there are several reasons besides RAM.
  • Yes, support can sometimes help, but your app/runtime design itself is also a strong cause. (huggingface.co, discuss.huggingface.co)

What Hugging Face Spaces are designed for

A Hugging Face Space is not a normal VPS. It is a managed container runtime with a web-facing app endpoint.

That matters because the platform comes with its own rules:

  • free hardware has lifecycle rules,
  • local disk is ephemeral unless you use the proper persistent storage path,
  • startup health matters,
  • outbound networking is restricted,
  • and complex stateful Docker apps need more care than simple demos. (huggingface.co, huggingface.co, huggingface.co)

Hugging Face’s own deployment guides for heavier Docker apps such as Label Studio, Langfuse, ZenML, and Giskard all emphasize persistence and runtime structure. That is a signal in itself: once you move beyond a simple app server, the platform gets less forgiving. (huggingface.co, huggingface.co, huggingface.co, huggingface.co)


What your Spaces are doing instead

From the code structure, your Spaces are not just serving one app. They are doing all of this:

  • boot from a shell script,
  • extract an encrypted web app archive,
  • maybe run npm install,
  • initialize or restore MariaDB,
  • start Node,
  • start Apache,
  • and in box2, continuously create DB snapshots in the background.

That is much closer to a small self-hosted stack than to a normal Space app.

That design is workable on a full server you control. On a free managed Docker Space, it creates many more ways for the runtime to recycle or fail even when there was no commit and no obvious user-side traffic spike.


Why a restart can happen with no new commit

This is the most important conceptual point.

A new commit is only one reason a Space restarts. Other causes include:

1) The main process exits

If the process that Hugging Face considers the main app process exits, the container exits.

2) Startup never becomes healthy

Hugging Face exposes startup_duration_timeout for a reason. The default is 30 minutes, and the app can still be treated as unhealthy if startup behavior is bad or incomplete. (huggingface.co)

3) Local state disappears or becomes inconsistent

If your app expects local DB/files to act like persistent state, a restart becomes much harder to recover from because the next boot has to do more work on ephemeral storage. Hugging Face’s storage docs explicitly say the disk is not persistent by default. (huggingface.co)

4) Child services fail inside a multi-service container

A Space can look “fine” from the outside while one internal service has already gone bad. That often leads to confusing behavior, partial failures, and later restarts.

5) Platform-side runtime/control issues

There are public cases where restart/factory reboot itself broke, returned 503, or the runtime seemed wedged even across recreated Spaces. That means some failures really are platform-side. (discuss.huggingface.co, discuss.huggingface.co)

So “no commit happened” does not mean “HF should never have restarted this.”


Why your specific code is vulnerable

A) Too much startup work

Your runtime is doing heavy tasks at startup:

  • archive extraction,
  • dependency install,
  • database init/import/restore,
  • process orchestration.

That is exactly the kind of startup path that becomes fragile on Spaces. Hugging Face’s config reference supports startup_duration_timeout and preload_from_hub because startup behavior is operationally important. (huggingface.co)

For a stable Space, startup should be as close as possible to:

  • read config/secrets,
  • start already-installed services,
  • become healthy quickly.

Your stack is much heavier than that.

B) You are using a local SQL database without persistent storage

Your two Spaces have no persistent storage.

That means the database layer is sitting on storage Hugging Face describes as non-persistent. This is not just a convenience issue. It directly changes runtime behavior:

  • every restart becomes more expensive,
  • state recovery becomes more complex,
  • boot becomes slower and less predictable,
  • and the platform sees a heavier app each time it has to restart. (huggingface.co)

This is the single biggest architectural mismatch in your setup.

C) box2 adds a costly background snapshot loop

Among your two Spaces, box2 is riskier because it keeps generating DB snapshots in the background. That means even when user activity is “light,” the container is still doing nontrivial work.

That is exactly the sort of hidden workload that can make a free container less stable without obvious traffic pressure.

D) Your current logging model hides the true failure point

If important services mostly write to internal log files instead of stdout/stderr, the Space log view becomes much less useful. Hugging Face recently improved Spaces log access with programmatic log tools, but those tools are only as useful as the logs you emit. (github.com)

So part of the mystery may be that the most important evidence is not reaching the main log stream.

E) Your process model is harder to supervise cleanly

A multi-process container where Apache, Node, and MariaDB are all started by a shell script is inherently more fragile than a single-process app. If one child process dies, the Space may become partially broken before the container eventually restarts.

That makes the runtime harder to reason about.


Why “not RAM” is not enough

This part matters because you emphasized it.

It is completely possible for a Space to restart while average RAM looks normal, because the trigger can be:

  • a child service exit,
  • an unhealthy startup transition,
  • a bad local state recovery,
  • a temporary platform/runtime issue,
  • or a short-lived resource spike that you never happened to see. (huggingface.co, huggingface.co)

So I would not use “RAM looked fine” as a strong argument that the platform restarted a perfectly healthy app. It may have. But your runtime gives several stronger explanations first.


The hidden networking detail that affects architecture

One detail that many people miss:

Hugging Face documents that outbound requests from Spaces are restricted to ports 80, 443, and 8080. (huggingface.co)

That matters because it makes “just use a normal external MySQL/Postgres server” less simple than it sounds. A direct connection on a typical DB port like 3306 may not work from a Space.

So your choices are not as open as they would be on a VPS:

  • local DB inside the container is fragile,
  • but external DB is also constrained by outbound networking rules.

That is an important part of the background here.


My actual diagnosis for your case

If I put everything together, this is my best reading:

Most likely

Your Spaces are restarting because the runtime design is already too close to the edge:

  • stateful local DB on ephemeral storage,
  • startup-heavy Docker entrypoint,
  • multi-process orchestration,
  • expensive background work,
  • weak log visibility.

Also possible

There may have been one or more Hugging Face runtime/control-plane problems on top of that, because public reports show that those do happen. But those reports do not erase the fact that your current setup is inherently restart-sensitive. (discuss.huggingface.co, discuss.huggingface.co)

So the best honest answer is:

The platform may have nudged it, but your code/runtime made it much easier for that nudge to become a visible restart.


Solutions, ordered by impact

1) Remove runtime installs

Do not do npm install at boot if you can avoid it.

Install dependencies in the image build. That reduces startup variability immediately.

This is one of the highest-value changes you can make.

2) Make startup much smaller

Try to move from:

  • extract,
  • install,
  • initialize,
  • restore,
  • then launch

to:

  • verify,
  • launch,
  • become healthy quickly.

If archive extraction must remain, then at least remove all the other avoidable boot work.

3) Stop treating local DB state as durable

With no persistent storage, local DB state is not trustworthy across restarts. Officially, local disk is ephemeral. (huggingface.co)

That means you should choose one of these paths:

Demo path

Make the app boot from a small seed and accept that user changes are not durable.

Persistent path

Add real persistence and redesign around it.

Right now you are between those two paths, which is the least stable place to be.

4) Slow down or remove the snapshot loop

For box2, the snapshot loop should be dramatically less frequent, or removed until true persistence exists.

A background safety loop that constantly dumps/compresses the DB can become a destabilizer on free managed infrastructure.

5) Supervise all services properly

The shell script should either:

  • remain PID 1 and supervise Apache/Node/MariaDB itself, or
  • use a proper process supervisor

The key goal is simple:

if a critical child dies, the container should fail cleanly and visibly, not drift into a half-working state.

6) Push service logs to stdout/stderr

This is a big practical fix.

Make Node, MariaDB, and Apache logs visible in the Space logs. Then the restart story becomes much easier to prove.

That also pairs well with the new Hugging Face log tooling. (github.com)

7) Set Docker config explicitly

Add these in README.md:

app_port: 7860
startup_duration_timeout: 1h

This does not solve everything, but it removes ambiguity and gives a startup-heavy container more room. (huggingface.co)

8) Use Dev Mode or programmatic log collection when reproducing

Hugging Face provides Dev Mode and improved log access for Spaces. For a stack like yours, debugging only via the normal web UI is not enough. (huggingface.co, github.com)


Final conclusion

Your Spaces are not restarting for “no reason.”

The reasons are most likely a combination of:

  • managed container lifecycle,
  • heavy startup/runtime design,
  • multi-service fragility,
  • local DB on ephemeral storage,
  • costly background snapshotting,
  • and possibly occasional Hugging Face runtime/control glitches. (huggingface.co, huggingface.co, discuss.huggingface.co)

So my clearest advice is:

  1. remove runtime installs,
  2. shrink startup,
  3. reduce or remove the snapshot loop,
  4. make logs visible,
  5. supervise services cleanly,
  6. choose either stateless demo mode or real persistence.

That is the most evidence-based explanation and the most realistic fix path for your case.

1 Like

Thank you so much for this amazing detailed answer

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.