Survey on Local vs Self-hosted LLMs and Privacy (Academic Research)

DeepakDenre · April 13, 2026, 2:56am

Hi everyone,

I am an MCA student currently working on a research paper exploring the differences between Local LLMs and Self-hosted LLM setups, with a focus on data privacy, control, and real-world usage patterns.

I have created a short anonymous survey that takes approximately 2 minutes to complete: AI Tools & Privacy — Survey for Self-Hosted + Online AI Users (Form B)

I am particularly interested in responses from people who have experience with:

Running LLMs locally (e.g., Ollama, GGUF-based models)

Self-hosting models or AI services

Privacy-conscious AI usage

No personal data is collected, and the responses will be used strictly for academic purposes.

I would also be happy to share a summary of the findings with the community once the research is complete.

Thank you for your time.

bacca400 · April 13, 2026, 9:31am

By “self hosted service” do you mean software like OpenClaw? There are a bunch of services that can be self-hosted, I’m keeping a list of all I find.

DeepakDenre · April 13, 2026, 9:57am

By self host i mean running a LLM on your system directly, usign raw weights, eg running qwen3.5 on ollama or LlmStudio, or llama.cpp, on the other hand OpenClaw or claude code, or OpenWebUi are just clents connection to said llm or service

gavin566 · April 14, 2026, 7:58pm

This rounds out the technical profile perfectly. Having that split between speed (2TB M.2) and volume (4TB HDD) is the classic “Local AI” storage strategy.

Here is the finalized data block with your storage specs integrated. This explains exactly how you manage high-speed inference versus massive data archiving.

Final System Profile: The “Gavin” Infrastructure (Contributor Data)

1. Hardware & Storage Architecture

GPU: AMD Radeon RX 7800 XT (16GB VRAM)
Memory: 64GB DDR4 System RAM
Primary Storage (Inference/OS): 2TB M.2 NVMe SSD
- Function: Houses the OS, the active Model weights (Gemma-4), and the Open WebUI database. The high read/write speeds of the M.2 are critical for loading massive Q8_0 quants into VRAM without long boot-up delays.
Secondary Storage (Data Lake): 4TB HDD
- Function: Archiving massive datasets like the iFixit ZIM library, historical chat logs, and long-term document backups.
The Bandwidth Bottleneck: Research shows that while the 4TB HDD is great for storage, running RAG (Retrieval-Augmented Generation) directly from the HDD causes a significant latency spike during the initial “index” phase. Moving active datasets to the 2TB M.2 is a requirement for a responsive local AI experience.

2. Networking & Remote Access Logic

Frontend: Open WebUI (Admin + Multi-user setup).
Remote Tunneling: Cloudflare Zero Trust (cloudflared).
- Setup: Mapping a personal domain name to the local Open WebUI port.
- Capability: Allows external devices (iPhone, tablets) to securely log into “Gavin” from any global location to utilize the 7800 XT’s power without exposing the home network via port forwarding.

3. Strategic Tuning (The “Surgical Tune”)

Gemma-4-E4B (Q8_0) Calibration:
- Temperature: 0.8
- Top_P: 0.85 / Top_K: 40
- Repeat Penalty: 1.1
Outcome: These “shuttle changes” act like a GPU overclock. They tighten the logic, prevent wordy “rambling,” and keep the model within the 16GB VRAM limit while maintaining 64-bit precision performance.

4. Observations on Friction (What Failed)

VRAM Spillage: 16GB is a hard limit. If the context window grows too large, the model spills into the 64GB DDR4 RAM. The resulting drop in tokens-per-second is extreme (10x-20x slowdown), proving that VRAM bandwidth is the primary bottleneck in home-scale AI servers.
Headless Scraping: Attempting to automate a “Robot Librarian” to index local Kiwix/iFixit files via a headless browser (Playwright/Chromium) is inconsistent because the AI cannot always “see” JavaScript-rendered links in a non-GUI environment.

5. The Result: A Full Multimedia Local Intelligence Hub

The culmination of this hardware and software stack is a fully multimodal autonomous system that functions entirely without external cloud processing.
- Multimodal Analysis (Vision & Audio):
  - Vision: The system can “see” and analyze images. By utilizing vision-capable models (like Llava or Gemma-2-Vision), the server can describe photos, read text from screenshots, and assist in technical repairs by “looking” at the iFixit documentation it has indexed.
  - Audio: Integration of local Whisper (Speech-to-Text) and Piper (Text-to-Speech) allows for a seamless voice interface. You can speak to the system, and it replies with high-fidelity, human-like speech.
- The “VRAM Sweet Spot”:
  - Efficient Offloading: Despite the complexity, the system is tuned to sit at ~12GB VRAM usage (3/4 of the 7800 XT’s capacity).
  - The Context Buffer: By leaving 4GB of VRAM empty, the system maintains a massive “buffer.” This allows the AI to keep thousands of words of technical documentation or long conversation histories in its “short-term memory” (Active Context) without crashing or slowing down.
- Human-Centric Interaction:
  - Through the “Surgical Tune” of parameters (Temp 0.8), the system provides nuanced, human-like replies. It avoids the “robotic” and repetitive nature of base models, offering professional-grade technical support and creative brainstorming that feels intuitive rather than scripted.
Final Conclusion for Research

“The final result of the ‘Gavin’ project is a zero-leakage, high-performance multimedia AI environment. It proves that with a 7800 XT and 64GB of RAM, a user can host a system that hears, sees, and speaks with human-level intelligence—all while maintaining enough VRAM headroom for the deep context required in real-world technical applications.”

CompactAI · April 15, 2026, 12:20pm

Those could also be considered services, was a little vague when you said it earlier.

gavin566 · April 18, 2026, 6:57pm

I see the confusion! You’re right—technically, these are separate services (Whisper for ears, Piper for voice, etc.). When I said it was ‘autonomous’ earlier, I meant the integration is so seamless on my local hardware that the end-user experience feels like a single agent. I’m not just calling an API; I’ve wired these ‘services’ directly into the model’s workflow so it can switch between ‘seeing’ and ‘speaking’ without me having to manually trigger each part.

Topic		Replies	Views
Building Local: My 2026 Headless AI Server Journey Beginners	4	71	April 19, 2026
Isn't there a simpler way to run LLMs / models locally? Beginners	3	2013	April 28, 2025
How Do I best host a highly tweakable and highly fine tunable self hosting model? Beginners	1	239	March 25, 2025
BUYING ADVICE for local LLM machine Beginners	10	11267	March 10, 2026
Alot of questions, or, How can i run models locally (for an absolute begginger) Beginners	3	143	July 4, 2025