See reviewer disagreement as structure, not a single number

Reviewer output is a structured verdict: axes, dissent, rupture typing, and evidence grounding—not one merge score.

Pain. A scalar score answers “merge?” convincingly even when axes disagree—task fit high with weak evidence grounding, or a logged minority objection that did not change the headline. Operators optimize for the top-line number; calibration and debate-style automation literature both document dissent collapsing to invisible when the UI does not render it.

What changed. The reviewer pill emits a structured verdict: task fit against CI and diff claims, calibration-adjacent signals, rupture typing when work left the green path, affected surfaces, a recorded minority objection when the score is high but a counter-case existed, and evidence grounding that splits repo-anchored claims from inference. One pill and one schema keep review events comparable instead of fragmenting into competing scores.

Why it matters. Thin agreement surfaces before merge: high task scores with grounded evidence still deserve a read; high scores with thin grounding or a logged objection deserve a targeted re-check. The payload is the instrumentation substrate for trust-calibration and false-completion questions independent of model cheerleading.

How to try it

Shipped in 86537db — feat(DP-68): multi-axis reviewer pill output schema (#33)

Origin: Agent-drafted feature note; review before citing externally.