Arxiv cs.CR endorsement request — three preprints on LLM/security

I’m an independent researcher writing to ask whether you’d be willing to endorse me for arxiv’s cs.CR category. I have three small preprints ready to upload, all of which sit in the LLM-security / honeypot-measurement / safety-classifier-calibration space — adjacent to your work on [SPECIFIC PAPER OR TOPIC OF THEIRS].

The most surprising of the three is a 14-page safety-research note documenting a frontier-LLM safety classifier (Claude Opus 4.7) refusing to score one specific student-LLM output on a CTI-style task, falsified across a 7-judge cross-vendor panel (Sonnet/Haiku/Gemma/foundation-sec/qwen/Llama-4/gpt-oss all engage). Has a self-correction story: I initially reported a 53 % refusal rate, then established that 15/16 of the “refusals” were upstream API credit-balance errors, leaving 1 genuine refusal with cleaner properties. Reproducibility artefacts (data + code + analyses) are released on Zenodo with DOIs.

Zenodo: 10.5281/zenodo.20383617 (the safety-note, ~14 pages)

Companion Paper 2 (Qwen2.5-7B QLoRA distillation, 20 pages): 10.5281/zenodo.20383612

Paper 1 (honeypot measurement, 38 pages) is not yet on Zenodo but I’m happy to send the PDF.

Arxiv’s endorsement is per-subject and one-time — once you’ve endorsed for cs.CR I can submit all three. The code you’d give me is a 6-character string from arxiv’s UI. No reading commitment expected; a 30-second skim of the safety-note abstract should be enough to decide.

[ SZYPXN ] endorsement code

3-bullet TL;DR for the safety-note (the strongest hook)

  1. Claude Opus 4.7 deterministically refuses to score 1 specific student-LLM output (chunk_idx=2 ttp_summary) — reproduces 5/5 stochasticity, 7+ trials across two production eval runs.

  2. The refusal does NOT generalise to content-class-similar synthetic records: a 24-record probe varying defensive-infrastructure entity attribution (CISA, NIST, FBI IC3, MS-ISAC, CERT-EU, BSI, NCSC-UK, JPCERT, plus Mandiant / CrowdStrike / SentinelOne) gets 0/192 refusals across an 8-judge cross-vendor panel.

  3. Two distinct refusal trigger modes surfaced (student-content-driven on the original record; prompt-context-conditioned on an unrelated CDN-attacker record paired with the CISA MAR PDF prompt). Methodology-correction arc: an initial 53 % refusal claim was 15 upstream API errors + 1 genuine refusal — the corrected finding is narrower but cleaner.

Thanks,
fiskkrok