Transcription summaries and actions

What is currently the best small model for summarising transcripts and extracting actions?

I’m looking at the <5B parameter or maybe <10B parameter classes

Transcripts will be produced by whisper + pyanote/diarization.

Audio clips will be at least 1hours long possibly as long as 6 hours in rare cases. So we can expect large transcripts.

For smaller models, I think the Llama 3.2 or Qwen 2.5 series are safe, but there may be specific benchmarks on the leaderboard. The URL below is for the long-context-support version of Qwen.

Thanks for this, I wasnt aware of Qwens long context model :slight_smile:

Any thoughts on wether it will be better to use long context and try to summarise in one go compared to chunking the input into intermediate summaries?

It would probably be more accurate to have the model directly summarize long contexts, but it would probably require a huge amount of VRAM and latency to process long contexts at once,:sweat_smile: so it would probably be smarter to process them in chunks. I think it would be easier to summarize short texts in chunks even with a small model.