Best practices to use models requiring flash_attn on Apple silicon macs (or non CUDA)?

kerrmetric · July 16, 2024, 4:28am

There are any number of models on HuggingFaces that seem to require flash_attn, even though my understanding is most models can actually work fine without it. A few examples:

What is the best practice to get them working on Apple M2/M3 laptops (ideally teally with Metal support)? Obviously flash_attn won’t be available, but there is still plenty of value in working with models locally on a laptop before they need the higher efficiency of flash_attn and CUDA.

I’ve found a few directional hints, but none of them have worked:

In theory you should be able to monkey patch out the exception triggered in transformers.dynamic_module_utils but I cannot get that to work

In theory you should be able to FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE pip install flash-attn==2.5.8 but that fails to build (due to some strange issue with os.rename not working on Mac OS).

Has anybody gotten these models working? Is there a general solution that Huggingface can implement to allow these models to run / train (even if it isn’t very efficient) on non CUDA devices?

kerrmetric · July 16, 2024, 4:37am

Just as I posted it, I found atleast one solution (the monkey patch approach) that works!

Can something like this be built into transformers so we don’t have to do it everytime?

annahaz · August 23, 2024, 8:05pm

thanks for sharing, it works! was also facing the same problem

Topic		Replies	Views
Running mpt-7b on Mac m1 Beginners	1	3791	May 22, 2023
Load Phi 3 small on Nvidia Tesla V100 - Flash Attention 🤗Transformers	3	1444	August 6, 2024
AssertionError: Torch not compiled with CUDA enabled 🤗Transformers	0	3007	June 1, 2023
Any idea on why flash attention installation with AMD gpu results in metadata-generation-failed? Beginners	1	245	October 15, 2024
How Can I Install Flash Attention 2 in a ZeroGPU Space Spaces	1	1062	July 30, 2025

Best practices to use models requiring flash_attn on Apple silicon macs (or non CUDA)?

Related topics