Preprint on LLM context compaction

ogaste · May 19, 2026, 8:18am

Hi HuggingFace community!

Sharing a preprint that some of you might find interesting,
on what LLMs forget when they compact their conversation history.

Paper: “Lost in Compaction: Measuring Information Loss in LLM Context Summaries”
DOI: lost in compaction
Code, data, human-judge calibration: GitHub - profff/lost-in-compaction · GitHub

Three findings that surprised me:

In the compacted zone of a context, LLM recall drops to 0-7% even though
keyword search still finds 82-93% of the facts. Information is present
in the context but ignored by attention.
Compaction damages even untouched parts of the context: remaining-zone
recall drops by ~20pp as compaction increases. Adding more “preserved”
summaries dilutes attention rather than helping.
The compaction phase itself is non-deterministic at temperature zero:
recall measurements on identical conversations span up to a factor 14×
across replicates. Single-shot benchmarks of compaction strategies are
unreliable, replicates are mandatory.

Methodology in short:

234 LongMemEval facts naturally embedded in 190K-token contexts
Single-pass compaction sweep (5-98%) on Claude Haiku 4.5 and Sonnet 4.6
Multi-pass strategy comparison (Brutal, Incremental, Frozen, FrozenRanked)
on a 5M-token conversation with 4-6 replicates per cell

Independent, self-funded research (out of pocket, no institutional
affiliation, no doctorate). Happy to answer technical questions about the
methodology, the strategies, or the follow-up directions I’m considering
(verbatim store + on-demand expansion, structured frozen graphs).

P.S. — While I’m here: if anyone has 3+ recent cs.CL papers on arXiv and
would consider endorsing my submission, I’d be very grateful. HAL France
rejected the deposit on credential grounds, so arXiv via personal
endorsement is the route I’m exploring. I’d send the endorsement URL by DM
after we connect, per arXiv’s one-to-one sharing policy.

Thanks for reading,
Olivier

Topic		Replies	Views
[Tool] Open-source prompt compressor for LLMs – 22% avg savings with spaCy + rules Show and Tell	2	199	May 19, 2026
[Paper] Delta Compression: Towards Efficient Semantic Compression Research	2	102	July 15, 2025
Thermodynamic Attention: Entropy-Based Memory Eviction for Long-Context Transformers Research	2	68	February 9, 2026
Making local LLMs more reliable with a deterministic “context compiler” Show and Tell	1	84	April 17, 2026
Llama-2 7B-hf repeats context of question directly from input prompt, cuts off with newlines 🤗Transformers	16	29502	January 10, 2025

Preprint on LLM context compaction

Related topics