Buying advice local llm

I cant decide which laptop to buy and go with, my budget is on par with m5 max 18-40 128 gb macbook pro so rtx5090 64-128 gb to 275-285hx models are also on the table with z13 and proart too…

I am lacking the hardware knowledge to proceed on which to buy…

I am in need of help !

Thanks in advance.

1 Like

The actual questions would probably look something like this:

  • While the best-supported backend (the software that runs the LLM) varies depending on the OS and GPU manufacturer, which OS should you choose?
  • VRAM might be faster than unified memory, but systems with unified memory are overwhelmingly better suited for running large LLMs. Do you prioritize model throughput or model size?
  • NVIDIA CUDA is the de facto standard, so if you plan to use open-source AI models other than LLMs, it offers a significant advantage. However, if you’re only using LLMs, the difference isn’t that significant anymore. Which GPU will you choose: NVIDIA, AMD, or Apple?
  • AMD ROCm support on Windows is decent as of today, but it’s not yet complete. Are you willing to take the risk? Or should you install Linux and use ROCm?

For your case, I would choose one of two directions:

  • MacBook Pro 16 with M5 Max and 128GB unified memory if your main goal is running bigger local LLMs on a laptop with the least friction around memory limits.
  • A 16-inch RTX 5090 laptop like the ROG Strix SCAR 16 or Lenovo Legion Pro 7i if your main goal is CUDA, Windows/Linux compatibility, and the widest local-AI tool support.

The machine I would not make the default pick for you is the ProArt P16. It is a good laptop, but for a local-LLM-first budget, it usually lands in the awkward middle: not the best big-model machine, and not the best CUDA machine. The ROG Flow Z13 is the interesting wildcard. It can make sense, but only if you specifically want the AMD large-memory route and accept a less mature Windows software stack. (Apple)

The one idea that makes this much easier

Do not start with brand or CPU.

Start with this question:

Which future regret would bother you most?

  1. “I bought a 5090 laptop, but some larger models or long contexts do not fit cleanly.”
  2. “I bought a Mac, but some CUDA-first tools are annoying or unavailable.”
  3. “I bought an unusual AMD machine, and now I spend time debugging the stack.”

That is the real decision. Local LLM buying is mostly about memory architecture and software backend maturity, not about who has the flashiest spec sheet. LM Studio’s docs explicitly show that if model weights do not fit in dedicated GPU memory, offload gets reduced and the rest goes into system RAM. That works, but it is slower than keeping more of the hot path in fast accelerator-accessible memory. (LM Studio)

The background, in plain English

A laptop for local LLMs is constrained by three things:

1. How much fast memory the model can really use

An RTX 5090 Laptop GPU has 24GB GDDR7. That is the dedicated GPU memory ceiling on those Windows gaming laptops. By contrast, Apple’s M5 Max MacBook Pro goes up to 128GB unified memory with up to 614GB/s bandwidth, and Apple explicitly ties the bandwidth increase to AI and LLM workloads. AMD’s Ryzen AI Max+ 395 systems can be configured with up to 128GB memory, and AMD says up to 96GB can be exposed as Variable Graphics Memory in supported systems. (NVIDIA)

2. Which backend stack is mature on that machine

For NVIDIA laptops, the answer is simple: CUDA. llama.cpp supports CUDA, Ollama supports NVIDIA GPUs, and LM Studio has explicit RTX 50-series support. On Apple Silicon, the native answers are MLX and Metal. Apple’s MLX is optimized for Apple silicon’s unified memory, and MLX-LM is specifically for generating text and fine-tuning LLMs on Apple silicon. For AMD, the story depends on OS: ROCm is strongest on Linux, while on Windows AMD’s HIP SDK is still only a subset of ROCm. (GitHub)

3. Whether the chassis can actually sustain the hardware

This is where laptop class matters. Notebookcheck’s reviews on the SCAR 16 and Legion Pro 7i both show the same pattern: excellent peak performance, but with the usual gaming-laptop tradeoffs like fan noise, power draw, and thicker designs in pursuit of performance. That matters because a “5090 laptop” is not just a GPU. It is also a cooling system, power budget, and noise profile. (Notebookcheck)

My recommendation, clearly

If local LLMs are the priority above everything else

Buy the 16-inch MacBook Pro M5 Max 128GB.
This is the best answer if you care most about bigger local models, fewer memory cliffs, and a machine that still feels like a laptop. Apple’s current 16-inch M5 Max supports 128GB unified memory and 614GB/s bandwidth, MLX is built for that architecture, and Apple’s own docs and talks now frame MLX as a direct path for running LLMs locally on Apple silicon. (Apple)

If Windows and CUDA are non-negotiable

Buy the ROG Strix SCAR 16 RTX 5090 or Lenovo Legion Pro 7i Gen 10 RTX 5090.
These are the most straightforward choices if you want CUDA-first local AI, broad app compatibility, and strong performance on small-to-medium local models. Both are current 16-inch 5090-class machines built around Intel’s Core Ultra 9 275HX and NVIDIA’s 24GB RTX 5090 Laptop GPU. (@ROG)

If you specifically want the unusual “large-memory AMD” path

Consider the ROG Flow Z13 2025.
This is not the safe default. It is the most interesting alternative. ASUS lists it with the Ryzen AI Max+ 395 and Radeon 8060S, while AMD’s own material explains why these systems are special for local AI: the 128GB configuration can expose up to 96GB as Variable Graphics Memory. The catch is software maturity, especially on Windows, where AMD’s HIP SDK remains only a subset of ROCm. (@ROG)

If you also care a lot about creator workflow, display, and style

Then the ProArt P16 starts making sense.
But only then. ASUS’s Japan store currently lists ProArt P16 H7606 variants around 64GB memory, with RTX 5070 or 5070 Ti configurations, and the RTX 5070 Ti variant is listed at up to 115W. That makes it a strong creator laptop, but for local-LLM-first buying it gives up too much versus Mac 128GB on memory capacity and versus 5090 laptops on GPU ceiling. (ASUS)

How to select, step by step

Step 1: Decide whether “bigger models” or “broader software” matters more

Choose MacBook Pro 128GB if your thought is:

  • “I want the laptop that handles larger local LLMs most gracefully.”
  • “I do not want to fight 24GB VRAM ceilings.”
  • “I am okay with MLX, Metal, llama.cpp, and LM Studio on macOS.”

Choose RTX 5090 laptop if your thought is:

  • “I want the most broadly compatible local AI laptop.”
  • “I want CUDA because many tools assume NVIDIA.”
  • “I mostly care about fast, straightforward Windows/Linux workflows.”

Choose Z13 only if your thought is:

  • “I understand this is a less standard path.”
  • “I specifically want AMD’s high-memory Strix Halo design.”
  • “I can tolerate more stack weirdness if the memory story is good.”

That is the main fork. Everything else is secondary. (OllamaDocument)

Step 2: Ignore CPU hype unless all your choices are already close

Between Core Ultra 275HX and 285HX, the CPU is not the part that will most often decide whether a local LLM experience feels good. In practice, memory fit and backend choice dominate first. That is why LM Studio emphasizes model fit, offload, and dedicated versus shared memory behavior. (LM Studio)

Step 3: Decide how much laptop behavior matters

A gaming 5090 laptop is usually:

  • thicker
  • louder
  • shorter-lived on battery
  • more desk-bound

Notebookcheck’s current reviews on the SCAR 16 and Legion Pro 7i make that tradeoff explicit. They are high-performance machines, but they are still gaming-laptop-class devices with the usual thermal and acoustic compromises. (Notebookcheck)

A MacBook Pro is usually chosen because it is easier to live with as an actual laptop while still offering unusually strong local-LLM memory behavior for its size. Apple’s current 16-inch M5 Max MacBook Pro supports the 128GB configuration, and Apple’s official specs list the machine at 2.15 kg. (Apple Support)

Step 4: Be honest about your software habits

If you know you will use a lot of:

  • CUDA-first tools
  • odd side projects
  • random GitHub repos that assume NVIDIA
  • Linux workflows

then NVIDIA is the safest choice. llama.cpp, Ollama, and LM Studio all document straightforward NVIDIA support. (GitHub)

If you mostly want:

  • local chat
  • RAG
  • coding assistants
  • GGUF-style experimentation
  • larger quantized models
  • a polished daily-driver laptop

then the Mac is the safer long-term bet. (Apple Machine Learning Research)

Pros and cons of each option

1) MacBook Pro 16 M5 Max 128GB

Why people buy it

Because it is the most convincing “big local model on a real laptop” machine in this group. The combination of 128GB unified memory, Apple’s MLX stack, and mature Metal support gives it a real structural advantage once you care about models and contexts that go beyond a normal laptop GPU’s dedicated memory. (Apple)

Pros

It has the best memory story of the group for local LLMs. The tooling is now real, not niche: Apple promotes MLX for Apple silicon, and MLX-LM exists specifically for LLM inference and fine-tuning. It is also the machine least likely to feel absurd when you are not plugged into a desk. (Apple Machine Learning Research)

Cons

You are giving up CUDA. That is the entire downside. Some tools have Mac equivalents. Some do not. Some work, but are not the default path. If you want maximum compatibility with the broader Windows/Linux local-AI ecosystem, a 5090 laptop is the more universally accepted answer. (GitHub)

My call

Best overall pick for you unless you already know you need CUDA. (Apple)

2) ASUS ROG Strix SCAR 16 RTX 5090

Why people buy it

Because it is the clearest expression of the Windows/CUDA route: Core Ultra 9 275HX, RTX 5090 Laptop GPU, up to 64GB RAM, and a 175W max TGP chassis designed to actually feed the GPU. (@ROG)

Pros

This is the safest local-AI laptop if you want NVIDIA. CUDA support is the broadest, setup is straightforward, and the chassis is tuned for performance rather than pretending to be an ultrabook. Notebookcheck specifically calls out the 175W RTX 5090, very high performance, and strong maintenance access. (GitHub)

Cons

The key limit does not go away: 24GB VRAM. That means once your workloads stop fitting nicely in dedicated GPU memory, you rely more on RAM and offload compromises. Notebookcheck also notes the usual gaming-laptop pain points such as loud fans and high power consumption. (NVIDIA)

My call

Best Windows pick if you want the least doubt and the widest ecosystem support. (@ROG)

3) Lenovo Legion Pro 7i Gen 10 RTX 5090

Why people buy it

Because it is the other serious 16-inch 5090 choice. Lenovo’s current page shows the 24GB RTX 5090 Laptop GPU and 64GB DDR5. Notebookcheck says Lenovo made the machine thicker this generation specifically to get more out of the Arrow Lake and Blackwell hardware. (Reno.)

Pros

Same CUDA advantage as the SCAR 16. Often a slightly cleaner aesthetic. Still a real performance chassis rather than a thin compromise. (Reno.)

Cons

Same 24GB VRAM ceiling. Same gaming-laptop class compromises. Same general logic as the SCAR 16, which means it is not the machine to buy if your main concern is pushing beyond normal dGPU memory limits. (NVIDIA)

My call

A very good alternative to the SCAR 16. Choose between them on design, price, keyboard, display preference, and local availability. The core buying logic is the same. (@ROG)

4) ASUS ROG Flow Z13 2025

Why people buy it

Because the memory concept is different. The Z13 uses AMD’s Ryzen AI Max+ 395 and Radeon 8060S, and AMD’s own explanation of Variable Graphics Memory is the reason this machine is relevant for local AI at all: up to 96GB can be carved out as graphics-addressable memory on 128GB systems. (@ROG)

Pros

This is the most interesting compromise in the list. It is far more portable than a 5090 gaming laptop, and its memory story is much better than a normal 24GB dGPU laptop if your workload can exploit the platform well. Notebookcheck also found the Strix Halo platform seriously capable for its size. (Notebookcheck)

Cons

The stack is less mature, especially on Windows. AMD’s Windows HIP SDK is still only a subset of ROCm. That means you are more likely to depend on Vulkan paths or run into app-by-app variation. It is the machine most likely to be brilliant in one setup and annoying in another. (ROCm Documentation)

My call

Buy this only if you are deliberately choosing the experiment. Do not buy it just because the concept sounds cool. (AMD)

5) ASUS ProArt P16

Why people buy it

Because it is a strong creator laptop with a nicer design language than most gaming laptops. ASUS’s Japan store shows multiple H7606 variants with 64GB memory and RTX 5070 or 5070 Ti options, plus a 3K OLED display and a roughly 1.95 kg chassis. (ASUS)

Pros

It is the easiest machine here to justify if you also care a lot about creative work, display quality, portability, and a more professional look. It still gives you NVIDIA support. (ASUS)

Cons

For a local-LLM-first buyer, it is usually the wrong optimization. Its GPU ceiling and memory ceiling sit below the most compelling alternatives for your budget. It is better understood as a creator laptop that also does local AI, not as the smartest pure local-LLM buy. (ASUS)

My call

Only choose this if your brief changes to “general premium creator laptop with some local LLM work.” (ASUS)

My final ranking for your exact situation

1. MacBook Pro 16 M5 Max 128GB

Best if your main objective is local LLMs first. Biggest reduction in future regret around memory capacity. (Apple)

2. ASUS ROG Strix SCAR 16 RTX 5090

Best if your main objective is Windows plus CUDA. (@ROG)

3. Lenovo Legion Pro 7i Gen 10 RTX 5090

Very close to the SCAR 16. Pick based on price and taste. (Reno.)

4. ASUS ROG Flow Z13

Best wildcard. Highest uncertainty. (AMD)

5. ASUS ProArt P16

Good machine. Wrong default for this goal. (ASUS)

The shortest answer

If you want me to stop hedging and just tell you what to buy:

  • Buy the MacBook Pro 16 M5 Max 128GB if local LLM is the center of the purchase.
  • Buy the SCAR 16 RTX 5090 if Windows and CUDA are more important than larger-model memory headroom.
  • Do not make the Z13 your default pick unless you actively want the AMD experiment.
  • Do not make the ProArt P16 your default pick unless creator-laptop qualities matter almost as much as local LLMs. (Apple)