[Tool] Open-source prompt compressor for LLMs – 22% avg savings with spaCy + rules

metawake · April 14, 2025, 11:47pm

Hey everyone !

I recently built a very experimental semantic prompt compressor aimed at reducing LLM token usage without losing important context.
Still not sure how worth is the idea, but I did have fun with this experiment.

Built with spaCy and YAML rule configs
Domain-sensitive (best for human queries)
Preserves >95% named entities and technical terms
Achieves ~22% compression across real-world prompts

It’s designed to work both for runtime compression and prompt normalization before storage / vector DB ingestion.

Open source and ready to test:
GitHub: GitHub - metawake/prompt_compressor
Full writeup

Would love feedback from the community whether this looks useful or not and whether you faced the need to implement something similar.
Is anyone fighting “token reduction” fight?

Cheers!

metawake · April 17, 2025, 10:35am

I’m planning a second iteration focused on adaptive output shaping — would love to hear what compression needs other devs are facing!

ogaste · May 19, 2026, 8:54am

Hi metawake,

Yes, I’m fighting the token-reduction fight, but coming at it from a
different angle. I just published a preprint measuring information loss
when LLMs summarize their own conversation history (curative compaction):

Your approach (preventive, prompt-level, rule-based) is orthogonal to mine
(curative, history-level, LLM-based). The two compose nicely: your tool
compresses individual messages on the way in, mine could compact the
accumulated history later. Worth chaining and measuring.

Two things in your design that I think are underappreciated and that I’d
love to discuss:

Compressing on the way in vs on the way out is a more important
distinction than the literature gives it credit for. Tool results,
chain-of-thought, and search outputs are typically 70-80% of the
verbose noise in a coding agent’s context. Compressing them before
they enter the context probably has more leverage than any sophisticated
history compaction strategy. Your tool is well-positioned for this.
Rule-based vs LLM-based compaction is a methodological lever I hadn’t
seriously considered until reading you. A rule-based compactor is
deterministic, which directly addresses a finding I made in my paper:
LLM-based compaction is non-deterministic at temperature zero,
producing run-to-run recall variance up to factor 14x on identical
conversations. A rule-based variant would remove that source of
variance entirely and make benchmarks much cleaner. If your tool
gives modest compression but stable behavior, that may be precisely
what you want for parts of an agent’s context (typed/structured
content especially).

Question for you: have you measured downstream task performance with vs
without your compressor? LLMLingua reports ~1.5% drop at 20x compression;
yours at 22% should land much lower, which would be a strong selling
point if measured.

In any case I’m including a follow-up section on input-side and rule-based
compression in my future work draft, partly inspired by your tool. Happy
to compare notes if you’re interested.

Cheers,
Olivier

Topic		Replies	Views
Preprint on LLM context compaction Research	0	22	May 19, 2026
I've built a LLM pre-processing toolbox and would love to hear your feedback Models	1	65	August 3, 2025
Llama-2 7B-hf repeats context of question directly from input prompt, cuts off with newlines 🤗Transformers	16	29502	January 10, 2025
[Paper] Delta Compression: Towards Efficient Semantic Compression Research	2	102	July 15, 2025
Making local LLMs more reliable with a deterministic “context compiler” Show and Tell	1	83	April 17, 2026

[Tool] Open-source prompt compressor for LLMs – 22% avg savings with spaCy + rules

Related topics