Preprint on LLM context compaction

Hi HuggingFace community! :waving_hand:

Sharing a preprint that some of you might find interesting,
on what LLMs forget when they compact their conversation history.

:page_facing_up: Paper: “Lost in Compaction: Measuring Information Loss in LLM Context Summaries”
:link: DOI: lost in compaction
:laptop: Code, data, human-judge calibration: GitHub - profff/lost-in-compaction · GitHub

:key: Three findings that surprised me:

  1. In the compacted zone of a context, LLM recall drops to 0-7% even though
    keyword search still finds 82-93% of the facts. Information is present
    in the context but ignored by attention.

  2. Compaction damages even untouched parts of the context: remaining-zone
    recall drops by ~20pp as compaction increases. Adding more “preserved”
    summaries dilutes attention rather than helping.

  3. The compaction phase itself is non-deterministic at temperature zero:
    recall measurements on identical conversations span up to a factor 14×
    across replicates. Single-shot benchmarks of compaction strategies are
    unreliable, replicates are mandatory.

:bar_chart: Methodology in short:

  • 234 LongMemEval facts naturally embedded in 190K-token contexts
  • Single-pass compaction sweep (5-98%) on Claude Haiku 4.5 and Sonnet 4.6
  • Multi-pass strategy comparison (Brutal, Incremental, Frozen, FrozenRanked)
    on a 5M-token conversation with 4-6 replicates per cell

:money_bag: Independent, self-funded research (out of pocket, no institutional
affiliation, no doctorate). Happy to answer technical questions about the
methodology, the strategies, or the follow-up directions I’m considering
(verbatim store + on-demand expansion, structured frozen graphs).


P.S. — While I’m here: if anyone has 3+ recent cs.CL papers on arXiv and
would consider endorsing my submission, I’d be very grateful. HAL France
rejected the deposit on credential grounds, so arXiv via personal
endorsement is the route I’m exploring. I’d send the endorsement URL by DM
after we connect, per arXiv’s one-to-one sharing policy.

Thanks for reading,
Olivier

1 Like