MPS is running slower than CPU on Mac M1 Pro

Hello everyone.

I have been recently testing the new version 0.3.0 on my M1 Pro but I found that following the steps from How to use Stable Diffusion in Apple Silicon (M1/M2) the execution times for CPU and MPS are on average for similar prompts:

  • GPU: 331 s
  • CPU: 222 s

Has anyone tested it too ?

Hi @polodealvarado! Your CPU numbers are very similar to the ones I get in my M1 Max, but as reported in the page you mentioned, the speed I see is much faster when using the GPU. Would you mind sharing a couple of details so I can try to take a look? These would be useful:

  • The amount of RAM your computer has.
  • The version of PyTorch you installed.
  • Your macOS version.
  • A small code snippet, only if you made any changes to the example we provided.

Thanks a lot!

HI! @pcuenq, thank you for answering.

Here you have all the details and more:

  • RAM: 16 GB
  • GPU cores: 16
  • macOS version: 12.5.1
  • Python version: 3.9.13
  • Diffuser version: 0.3.0
  • Torch version: 1.13.0.dev20220908

I have been using the same code without touching it. On the other hand, I tried another jupyter notebook from this repository and the results are quite similar (cpu works better than mps).

I am following this thread, running mps backend. @pcuenq

That’s a very interesting thread! They specifically say that random operations are not yet optimized; however, diffusers’ code generates random latents in CPU when using the mps device.

I’ll do some testing, thanks!

This also happens to me guys… my CPU takes around 4m 30s, my GPU (mps) takes more than 20 minutes??
Same code, I was simply changing:

pipe = pipe.to("mps")

To

pipe = pipe.to("cpu")
  • RAM: 16 GB
  • GPU cores: 16
  • macOS version: 12.6
  • Python version: 3.10.4
  • Diffuser version: 0.6.0
  • Torch version: 1.14.0.dev20221031

We are going to release a new version of diffusers this week optimized for PyTorch 1.13, which was released last Saturday.

In the meantime, TL;DR:

  • Install production version of PyTorch, not the nightly one. You should get version 1.13.0.
  • Use the main branch of diffusers instead of the one from PyPi (pip install git+https://github.com/huggingface/diffusers).
  • Use attention slicing to optimize memory usage and prevent swapping (pipe.enable_attention_slicing() after you create your Stable Diffusion pipeline).