How can i duplicate some space that working on zero and make it works on CPU or GPU?

lior1982 · March 12, 2026, 12:53pm

Hi, how can i duplicate some space that working on zero and make it works on CPU or GPU?

John6666 · March 13, 2026, 3:12am

This is essentially a code porting task. If the target is GPU space, the difficulty of migration depends on GPU performance. (If performance is sufficient, it should run with minimal changes.) If the target is CPU space, it might be usable for purposes like verifying the created UI, but don’t expect it to be practical.

Incidentally, even when cloning from an older Zero GPU Space to another Zero GPU Space, minor adjustments are occasionally necessary. This is because dependencies and the behavior of the HF-side runtime change over time.

The reliable way to think about this is:

Duplicate the Space
Choose the new hardware
Re-add secrets
Run it once with as few code changes as possible
Then decide whether to keep one portable codebase or split into CPU-only / dedicated-GPU versions

That order matters because, on Hugging Face, duplicating a Space is easy, but matching the runtime behavior across ZeroGPU, CPU, and dedicated GPU is the real migration work. Hugging Face’s docs say duplicated Spaces are created from the three-dot menu, are private by default, can be assigned new hardware during duplication, auto-populate public Variables, and do not copy Secrets automatically. The duplicate also starts on free CPU by default unless you choose something else. (Hugging Face)

1. Why moving off ZeroGPU is not just “change hardware”

ZeroGPU is a shared, dynamic GPU system for Spaces. Hugging Face documents it as NVIDIA H200-backed, with large = 70 GB VRAM and xlarge = 141 GB VRAM, and GPU work is requested with @spaces.GPU. The same docs also say that @spaces.GPU is effect-free outside ZeroGPU, which is the key reason many ZeroGPU apps can still run on CPU or a paid GPU without removing the decorator immediately. (Hugging Face)

That leads to a very important distinction:

ZeroGPU → dedicated GPU is often straightforward.
ZeroGPU → CPU is often not.
One repo that works on all three is possible, but you need to write to the strictest contract, which is ZeroGPU’s stateless GPU model. (Hugging Face)

2. What duplication actually does

In the UI

The official flow is:

open the Space
click the three dots
choose Duplicate this Space
choose owner, Space name, visibility, hardware, storage, and variables/secrets

Hugging Face explicitly says duplicated Spaces are private by default, that the duplicate workflow can warn you about missing secrets, and that public Variables can be carried over while Secrets are not. The duplicate uses free CPU hardware by default unless you change it. (Hugging Face)

Programmatically

If you want automation, Hugging Face’s Hub API docs show duplicate_repo(..., repo_type="space", ...) and duplicate_space(...). The examples include duplicating a Space with custom hardware, and the docs say the new Space can be created in the same running/paused state as the source. (Hugging Face)

Example:

from huggingface_hub import duplicate_repo

duplicate_repo(
    "owner/source-space",
    repo_type="space",
    space_hardware="t4-medium",
)

That is useful if you maintain many Spaces or want repeatable migrations. (Hugging Face)

3. Choose hardware by memory envelope, not by label

This is where many migrations go wrong.

Hugging Face currently lists these relevant hardware tiers:

CPU Basic: 2 vCPU, 16 GB RAM
CPU Upgrade: 8 vCPU, 32 GB RAM
T4 small: 16 GB VRAM
L4 x1: 24 GB VRAM
A10G small: 24 GB VRAM
A100 large: 80 GB VRAM
ZeroGPU large: 70 GB VRAM
ZeroGPU xlarge: 141 GB VRAM (Hugging Face)

That means a Space that runs comfortably on ZeroGPU large may still fail on a T4 or A10G, even though those are also GPU Spaces, because you are moving from 70 GB VRAM to 16–24 GB VRAM. Likewise, moving to CPU is not “same thing but slower”; it is often a change in the entire model/runtime budget. (Hugging Face)

So before touching code, decide which target you really want:

A. Dedicated GPU copy

Good when you want stable latency and a persistent GPU. This is usually the easiest destination for a ZeroGPU app. (Hugging Face)

B. CPU copy

Good for a lightweight fallback, but usually needs more code and model changes. CPU Basic and CPU Upgrade have much smaller effective inference budgets than ZeroGPU. (Hugging Face)

C. Portable one-repo design

Good when you want the same repo to work on ZeroGPU, CPU, and dedicated GPU. This is usually the best long-term design, but the code must follow ZeroGPU-safe patterns. (Hugging Face)

4. The safest migration path

This is the sequence I would recommend in practice:

Step 1: duplicate first, without rewriting code yet

Create the duplicate and restore the missing secrets. Since public Variables can be copied and Secrets cannot, missing credentials are one of the most common reasons a duplicated Space “suddenly” fails. (Hugging Face)

Step 2: let it boot once on CPU

This is useful even if your real goal is a paid GPU. It tells you whether the repo builds and starts cleanly in a fresh Space environment. Hugging Face’s duplication flow defaults to CPU anyway, so this is the natural first checkpoint. (Hugging Face)

Step 3: only then switch hardware

If the duplicate builds and the UI appears, move it to the GPU tier you want. Paid hardware runs indefinitely by default unless you set a sleep time, while free CPU sleeps automatically after inactivity. (Hugging Face)

Step 4: change code only if the duplicate fails or performs badly

Many ZeroGPU apps already run on dedicated GPU with few or no code changes because the decorator remains harmless outside ZeroGPU. CPU is where code and model changes become much more likely. (Hugging Face)

5. Should you remove `@spaces.GPU`?

Usually, not at first.

Hugging Face’s ZeroGPU docs state that @spaces.GPU is effect-free in non-ZeroGPU environments. So if your goal is “duplicate this ZeroGPU Space and make it run on CPU or paid GPU,” the safest first move is often to keep the decorator and only change the hardware setting. (Hugging Face)

That gives you three possible codebase strategies:

Keep it

Best if you want one repo that can still run on ZeroGPU later. (Hugging Face)

Remove it later

Best if the duplicate will live permanently on a dedicated GPU and you want simpler startup logic and lower latency. (Hugging Face)

Keep it in the portable branch, remove it in a GPU-only fork

Best if you want both portability and a highly optimized paid-GPU deployment. (Hugging Face)

6. The most important code rule

If you want the same code to work on ZeroGPU, CPU, and dedicated GPU:

Do not initialize CUDA at import time.

This is the biggest source of ZeroGPU-specific breakage. ZeroGPU is stateless, and a representative public issue shows vLLM failing because it calls torch.cuda.set_device(...) during model setup, which conflicts with the stateless GPU contract. (GitHub)

Typical anti-patterns:

import torch
from diffusers import DiffusionPipeline

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
pipe = DiffusionPipeline.from_pretrained("model-id").to("cuda")

or

model = AutoModelForCausalLM.from_pretrained(...).cuda()

or

torch.load("weights.pt", map_location="cuda")

Those patterns are fine in many traditional GPU environments, but they are exactly the patterns that create trouble in ZeroGPU-style runtimes. (Hugging Face)

7. The portable pattern that works best

This is the design I recommend first.

Core idea

load safely
detect the device only when needed
move to GPU only inside the inference path
keep CPU as a real fallback
keep @spaces.GPU in place for portability

Example:

import torch
import gradio as gr
import spaces
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_ID = "distilgpt2"

tokenizer = None
model = None
loaded_device = None

def resolve_device() -> str:
    return "cuda" if torch.cuda.is_available() else "cpu"

def load_model_for(device: str):
    global tokenizer, model, loaded_device

    if model is not None and loaded_device == device:
        return

    tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

    dtype = torch.float16 if device == "cuda" else torch.float32

    model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID,
        torch_dtype=dtype,
    ).to(device)

    loaded_device = device

@spaces.GPU(duration=60)   # harmless outside ZeroGPU
def generate(prompt: str, max_new_tokens: int = 64):
    device = resolve_device()
    load_model_for(device)

    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    with torch.inference_mode():
        outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

demo = gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label="Prompt"),
        gr.Slider(1, 256, value=64, step=1, label="Max new tokens"),
    ],
    outputs=gr.Textbox(label="Output"),
)

if __name__ == "__main__":
    demo.launch()

Why this is good:

still valid for ZeroGPU
still valid for dedicated GPU
still valid for CPU
avoids top-level CUDA initialization
keeps hardware switching mostly a configuration problem instead of a rewrite problem (Hugging Face)

8. How to rewrite for a dedicated-GPU-only Space

Once a dedicated GPU duplicate is stable, you may want to optimize for that environment only.

Typical changes:

preload the model onto CUDA at startup
remove ZeroGPU-specific duration logic
simplify device branching
optionally drop the spaces dependency if you will never go back to ZeroGPU

Example:

import torch
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_ID = "distilgpt2"

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=DTYPE,
).to(DEVICE)

def generate(prompt: str, max_new_tokens: int = 64):
    inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)
    with torch.inference_mode():
        outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

demo = gr.Interface(fn=generate, inputs="text", outputs="text")

if __name__ == "__main__":
    demo.launch()

This is cleaner and usually faster on a persistent GPU, but it is no longer the safest baseline if you still want to preserve ZeroGPU compatibility. Hugging Face’s GPU docs also note that some frameworks need CUDA-capable installs to take advantage of upgraded hardware. (Hugging Face)

9. How to rewrite for CPU

CPU migration usually needs the most adaptation.

Typical CPU changes:

force device = "cpu"
use torch.float32 unless you know a smaller dtype is safe on CPU
reduce batch size, resolution, token limit, or inference steps
choose a smaller model if needed
remove libraries that depend on GPU-specific behavior

Example:

import torch
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_ID = "distilgpt2"

DEVICE = "cpu"
DTYPE = torch.float32

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=DTYPE,
).to(DEVICE)

def generate(prompt: str, max_new_tokens: int = 32):
    inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)
    with torch.inference_mode():
        outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

For larger image or LLM apps, a CPU fork often becomes a different product tier, not just “same app, slower.” The hardware gap between CPU Basic and ZeroGPU/dedicated GPU is large enough that defaults often need to change. (Hugging Face)

10. What usually needs changing in the repo

`README.md` YAML

Spaces are configured by the YAML block at the top of README.md. Hugging Face documents fields such as sdk, python_version, sdk_version, app_file, and suggested_hardware. Crucially, suggested_hardware is only a hint for duplicators; it does not assign hardware automatically. (Hugging Face)

Example:

---
title: My Space
sdk: gradio
python_version: 3.10
sdk_version: 5.49.1
app_file: app.py
suggested_hardware: a10g-small
---

That is especially helpful if other people will duplicate your Space. (Hugging Face)

`requirements.txt`

Hugging Face says Python dependencies go in requirements.txt, earlier bootstrap packages can go in pre-requirements.txt, and Debian packages can go in packages.txt. (Hugging Face)

A typical migration-oriented requirements file might look like:

gradio==5.49.1
spaces>=0.30.0
torch>=2.4,<2.8
transformers>=4.45,<5

The exact versions depend on the app, but pinning is useful because rebuilds on Spaces are fresh deployments, and version drift is a common cause of “worked before, broken after duplicate/restart.” That is also why the official docs expose sdk_version in the README metadata. (Hugging Face)

11. Common failure modes after duplication

Missing secrets

Very common. The app works in the source Space because credentials are already configured; the duplicate fails because Secrets were not copied. (Hugging Face)

Wrong target hardware

A ZeroGPU app that relied on 70 GB VRAM may not fit on 16–24 GB VRAM GPUs. Pick hardware based on actual memory needs, not on the generic word “GPU.” (Hugging Face)

CPU fallback is unrealistic

Many ZeroGPU demos were never designed for CPU-level throughput or memory. In those cases, the right solution is usually a smaller model or a separate CPU configuration. (Hugging Face)

Framework incompatibility

Some frameworks are less compatible with ZeroGPU’s stateless model than high-level Hugging Face stacks such as transformers or diffusers. The vLLM issue is the clearest public example. (Hugging Face)

Fresh rebuild reveals dependency issues

A duplicated Space gets a fresh environment. If you were relying on loosely pinned dependencies, the duplicate may surface a problem that the original had not shown yet. Hugging Face’s Spaces docs make clear that dependency installation happens from your repo files such as requirements.txt, pre-requirements.txt, and README metadata. (Hugging Face)

12. Gradio 6: only migrate it separately

If the Space also needs a Gradio 6 upgrade, treat that as a separate migration.

Gradio’s migration guide says:

show_api in launch() becomes footer_links
event listener show_api / api_name=False become api_visibility
the API surface changes in ways that can affect how clients call your Space (Gradio)

That means a safe order is:

duplicate and stabilize
switch hardware
only then upgrade Gradio major versions

If you do both at once, it becomes much harder to tell whether the failure came from hardware semantics or API/framework changes. (Gradio)

13. The practical recommendations I would follow

If your goal is “I just want my own working copy”

Duplicate it, restore secrets, leave the code mostly unchanged, test the duplicate on CPU once, then switch to dedicated GPU if needed. This keeps the migration simple and uses the official duplication workflow as intended. (Hugging Face)

If your goal is “I want a strong dedicated-GPU deployment”

Duplicate it to a GPU tier, keep @spaces.GPU initially, confirm it works, then optionally refactor into a dedicated-GPU-only version with eager CUDA preload and simpler code. (Hugging Face)

If your goal is “I want one repo that runs everywhere”

Keep @spaces.GPU, avoid import-time CUDA, detect device lazily, and branch explicitly for CPU vs GPU dtype and limits. That gives you the most portable baseline. (Hugging Face)

If your goal is “I want a CPU version too”

Expect a separate configuration, and often a separate model or default UX. CPU is usually the first place where “same code, same defaults” stops being realistic. (Hugging Face)

14. Bottom line

The cleanest answer is:

Yes, you can duplicate a ZeroGPU Space and run the duplicate on CPU or dedicated GPU.
The simplest first step is to duplicate it and change only the hardware.
For dedicated GPU, that often works surprisingly well with minimal code changes.
For CPU, you usually need explicit changes to model size, dtype, limits, and sometimes the entire UX.
If you want one codebase to work everywhere, keep @spaces.GPU and structure the code so CUDA is never initialized at import time. (Hugging Face)

lior1982 · March 15, 2026, 8:44am

Thank you very much! I appreciate the effort in answering.

Topic		Replies	Views
Need help duplicating Space from Space Zero GPU to paid hardware Beginners	4	46	March 25, 2026
Zero GPU Worker Error Spaces	32	2789	August 25, 2025
I can't duplicate a space on zerogpu Beginners	1	213	August 19, 2024
Help with Duplicating a SadTalker Space on Hugging Face Beginners	18	930	August 30, 2024
Problems with duplication Beginners	45	294	June 6, 2026

How can i duplicate some space that working on zero and make it works on CPU or GPU?

1. Why moving off ZeroGPU is not just “change hardware”

2. What duplication actually does

In the UI

Programmatically

3. Choose hardware by memory envelope, not by label

A. Dedicated GPU copy

B. CPU copy

C. Portable one-repo design

4. The safest migration path

Step 1: duplicate first, without rewriting code yet

Step 2: let it boot once on CPU

Step 3: only then switch hardware

Step 4: change code only if the duplicate fails or performs badly

5. Should you remove @spaces.GPU?

Keep it

Remove it later

Keep it in the portable branch, remove it in a GPU-only fork

6. The most important code rule

Do not initialize CUDA at import time.

7. The portable pattern that works best

Core idea

8. How to rewrite for a dedicated-GPU-only Space

9. How to rewrite for CPU

10. What usually needs changing in the repo

README.md YAML

requirements.txt

11. Common failure modes after duplication

Missing secrets

Wrong target hardware

CPU fallback is unrealistic

Framework incompatibility

Fresh rebuild reveals dependency issues

12. Gradio 6: only migrate it separately

13. The practical recommendations I would follow

If your goal is “I just want my own working copy”

If your goal is “I want a strong dedicated-GPU deployment”

If your goal is “I want one repo that runs everywhere”

If your goal is “I want a CPU version too”

14. Bottom line

Related topics

5. Should you remove `@spaces.GPU`?

`README.md` YAML

`requirements.txt`