Arxiv cs.CR endorsement request — three preprints on LLM/security

Klinkerlag · May 26, 2026, 6:18pm

I’m an independent researcher writing to ask whether you’d be willing to endorse me for arxiv’s cs.CR category. I have three small preprints ready to upload, all of which sit in the LLM-security / honeypot-measurement / safety-classifier-calibration space — adjacent to your work on [SPECIFIC PAPER OR TOPIC OF THEIRS].

The most surprising of the three is a 14-page safety-research note documenting a frontier-LLM safety classifier (Claude Opus 4.7) refusing to score one specific student-LLM output on a CTI-style task, falsified across a 7-judge cross-vendor panel (Sonnet/Haiku/Gemma/foundation-sec/qwen/Llama-4/gpt-oss all engage). Has a self-correction story: I initially reported a 53 % refusal rate, then established that 15/16 of the “refusals” were upstream API credit-balance errors, leaving 1 genuine refusal with cleaner properties. Reproducibility artefacts (data + code + analyses) are released on Zenodo with DOIs.

Zenodo: 10.5281/zenodo.20383617 (the safety-note, ~14 pages)

Companion Paper 2 (Qwen2.5-7B QLoRA distillation, 20 pages): 10.5281/zenodo.20383612

Paper 1 (honeypot measurement, 38 pages) is not yet on Zenodo but I’m happy to send the PDF.

Arxiv’s endorsement is per-subject and one-time — once you’ve endorsed for cs.CR I can submit all three. The code you’d give me is a 6-character string from arxiv’s UI. No reading commitment expected; a 30-second skim of the safety-note abstract should be enough to decide.

[ SZYPXN ] endorsement code

3-bullet TL;DR for the safety-note (the strongest hook)

Claude Opus 4.7 deterministically refuses to score 1 specific student-LLM output (chunk_idx=2 ttp_summary) — reproduces 5/5 stochasticity, 7+ trials across two production eval runs.

The refusal does NOT generalise to content-class-similar synthetic records: a 24-record probe varying defensive-infrastructure entity attribution (CISA, NIST, FBI IC3, MS-ISAC, CERT-EU, BSI, NCSC-UK, JPCERT, plus Mandiant / CrowdStrike / SentinelOne) gets 0/192 refusals across an 8-judge cross-vendor panel.

Two distinct refusal trigger modes surfaced (student-content-driven on the original record; prompt-context-conditioned on an unrelated CDN-attacker record paired with the CISA MAR PDF prompt). Methodology-correction arc: an initial 53 % refusal claim was 15 upstream API errors + 1 genuine refusal — the corrected finding is narrower but cleaner.

Thanks,
fiskkrok

Topic		Replies	Views
[Request] Seeking arXiv cs.AI endorsement — independent researcher, LLM metacognition benchmark (live Kaggle leaderboard, 8 frontier models, N=69 human panel) Beginners	0	49	April 21, 2026
Looking for arXiv cs.CR endorsement Awesome paper	0	28	May 20, 2026
Need Endorsement for arXiv Beginners	0	38	May 15, 2026
Pure Prompt vs Cognitive Runtime for PR Review: A Reproducible Case Study Research	3	104	May 6, 2026
arXiv Endorsement for Enterprise Multi-Agent Reliability Paper Research	1	46	February 26, 2026

Arxiv cs.CR endorsement request — three preprints on LLM/security

3-bullet TL;DR for the safety-note (the strongest hook)

Related topics