Instructions to use Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16",
	filename="CPsyCounX.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16
# Run inference directly in the terminal:
llama-cli -hf Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16
# Run inference directly in the terminal:
llama-cli -hf Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16
# Run inference directly in the terminal:
./llama-cli -hf Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16

Use Docker

docker model run hf.co/Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16

LM Studio
Jan
Ollama
How to use Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16 with Ollama:
```
ollama run hf.co/Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16
```

Unsloth Studio new

How to use Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16 to start chatting

Docker Model Runner
How to use Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16 with Docker Model Runner:
```
docker model run hf.co/Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16
```

Lemonade

How to use Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16

Run and chat with the model

lemonade run user.CPsyCounX-InternLM2-Chat-7B-GGUF-fp16-{{QUANT_TAG}}

List all available models

lemonade list

Model Details

Model Description

Developed by: AITA
Model type: Full-Precision Text Generation LLM (FP16 GGUF format)
Original Model: https://huggingface.co/CAS-SIAT-XinHai/CPsyCounX
Precision: FP16 (non-quantized full-precision version)

Repository

GGUF Converter: llama.cpp
Huggingface Hub: https://huggingface.co/Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16

Usage

Method 1: llama.cpp Backend Server + Chatbox

Step 1: Start .llama.cpp Server

./llama-server \
  -m /path/to/model.gguf \
  -c 2048 \          # Context length
  --host 0.0.0.0 \   # Allow remote connections
  --port 8080 \      # Server port
  --n-gpu-layers 35  # GPU acceleration (if available)

Step 2: Connect via Chatbox

Download Chatbox

Configure API endpoint:

API URL: http://localhost:8080
Model: (leave empty)
API Type: llama.cpp

Set generation parameters:

{
  "temperature": 0.7,
  "max_tokens": 512,
  "top_p": 0.9
}

Method 2: LM Studio

Download LM Studio
Load GGUF file:
- Launch LM Studio
- Search Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16

Configure settings:

Context Length: 2048
GPU Offload: Recommended (enable if available)
Batch Size: 512

Start chatting through the built-in UI

Precision Details

Filename	Precision	Size	Characteristics
CPsyCounX.gguf	FP16	[15.5GB]	Full original model precision

Hardware Requirements

Minimum:

24GB RAM (for 7B model)
CPU with AVX/AVX2 instruction set support

Recommended:

32GB RAM
CUDA-capable GPU (for acceleration)
Fast SSD storage (due to large model size)

Key Notes

Requires latest llama.cpp (v3+ recommended)
Use --n-gpu-layers 35 for GPU acceleration (requires CUDA-enabled build)
Initial loading takes longer (2-5 minutes)
Requires more memory/storage than quantized versions
Use --mlock to prevent swapping

Advantages

Preserves original model precision
Ideal for precision-sensitive applications
No quantization loss
Suitable for continued fine-tuning

Downloads last month: 13

GGUF

Model size

8B params

Architecture

internlm2

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16

Base model

internlm/internlm2_5-7b-chat

Quantized

(26)

this model

Slipstream-Max
/

CPsyCounX-InternLM2-Chat-7B-GGUF-fp16