[Tool] Open-source prompt compressor for LLMs – 22% avg savings with spaCy + rules

Hey everyone !

I recently built a very experimental semantic prompt compressor aimed at reducing LLM token usage without losing important context.
Still not sure how worth is the idea, but I did have fun with this experiment.

:gear: Built with spaCy and YAML rule configs
:test_tube: Domain-sensitive (best for human queries)
:locked: Preserves >95% named entities and technical terms
:chart_decreasing: Achieves ~22% compression across real-world prompts

It’s designed to work both for runtime compression and prompt normalization before storage / vector DB ingestion.

Open source and ready to test:
:backhand_index_pointing_right: GitHub: GitHub - metawake/prompt_compressor
:backhand_index_pointing_right: Full writeup

Would love feedback from the community whether this looks useful or not and whether you faced the need to implement something similar.
Is anyone fighting “token reduction” fight?

Cheers!

I’m planning a second iteration focused on adaptive output shaping — would love to hear what compression needs other devs are facing!

Hi metawake,

Yes, I’m fighting the token-reduction fight, but coming at it from a
different angle. I just published a preprint measuring information loss
when LLMs summarize their own conversation history (curative compaction):

Your approach (preventive, prompt-level, rule-based) is orthogonal to mine
(curative, history-level, LLM-based). The two compose nicely: your tool
compresses individual messages on the way in, mine could compact the
accumulated history later. Worth chaining and measuring.

Two things in your design that I think are underappreciated and that I’d
love to discuss:

  1. Compressing on the way in vs on the way out is a more important
    distinction than the literature gives it credit for. Tool results,
    chain-of-thought, and search outputs are typically 70-80% of the
    verbose noise in a coding agent’s context. Compressing them before
    they enter the context probably has more leverage than any sophisticated
    history compaction strategy. Your tool is well-positioned for this.

  2. Rule-based vs LLM-based compaction is a methodological lever I hadn’t
    seriously considered until reading you. A rule-based compactor is
    deterministic, which directly addresses a finding I made in my paper:
    LLM-based compaction is non-deterministic at temperature zero,
    producing run-to-run recall variance up to factor 14x on identical
    conversations. A rule-based variant would remove that source of
    variance entirely and make benchmarks much cleaner. If your tool
    gives modest compression but stable behavior, that may be precisely
    what you want for parts of an agent’s context (typed/structured
    content especially).

Question for you: have you measured downstream task performance with vs
without your compressor? LLMLingua reports ~1.5% drop at 20x compression;
yours at 22% should land much lower, which would be a strong selling
point if measured.

In any case I’m including a follow-up section on input-side and rule-based
compression in my future work draft, partly inspired by your tool. Happy
to compare notes if you’re interested.

Cheers,
Olivier