See reviewer disagreement as structure, not a single number
Reviewer output is a structured verdict: axes, dissent, rupture typing, and evidence grounding—not one merge score.
Pain. A scalar score answers “merge?” convincingly even when axes disagree—task fit high with weak evidence grounding, or a logged minority objection that did not change the headline. Operators optimize for the top-line number; calibration and debate-style automation literature both document dissent collapsing to invisible when the UI does not render it.
What changed. The reviewer pill emits a structured verdict: task fit against CI and diff claims, calibration-adjacent signals, rupture typing when work left the green path, affected surfaces, a recorded minority objection when the score is high but a counter-case existed, and evidence grounding that splits repo-anchored claims from inference. One pill and one schema keep review events comparable instead of fragmenting into competing scores.
Why it matters. Thin agreement surfaces before merge: high task scores with grounded evidence still deserve a read; high scores with thin grounding or a logged objection deserve a targeted re-check. The payload is the instrumentation substrate for trust-calibration and false-completion questions independent of model cheerleading.
How to try it
- Complete a reviewer pass on a real card and open the stored verdict payload.
- Confirm multi-axis fields (task, calibration signals, rupture type, evidence grounding) populate per schema.
- Before merging on a high score, read the minority-objection line when present.
Shipped in 86537db — feat(DP-68): multi-axis reviewer pill output schema (#33)