(논문 요약) One Token to Fool LLM-as-a-Judge | Jaemin’s Arxiv

(논문 요약) One Token to Fool LLM-as-a-Judge (paper)

핵심 내용

false posivie rewards
- non-word symbols (e.g.,”:” or “.”)
- reasoning openers
  - “Thought process:”
  - “Let’s solve this problem step by step.”

Qwen2.5-7B-Instruct 모델에 SFT -> Master-RM
- 학습데이터: 160k public data + FP 방지 목적 aug-data
- FP 방지 목적 aug-data: GPT-4o-mini 답변의 첫번째 문장만 가져와서 ‘No’ 로 레이블 (20k)
  - 예시: “To solve the problem, we need to find the mode, median, and average of the donation amounts from the students.”