Score claim calibration against the diff that actually shipped

Empirical ratio of session-report claims that hold against the merged PR diff—cheap sycophancy pressure metric.

Pain. Session reports and agent prose claim facts about the repo; if nobody checks claims against the merged PR, sycophancy and confabulation look identical to accuracy in summaries.

What changed. calibration_score compares session-report claims to the merged PR diff with an empirical ratio—claims that hold versus claims made—surfaced on reviewer-adjacent rollups.

Why it matters. This is a cheap, automatable pressure metric: it does not replace human audit, but it gives calibration diagrams a numerator/denominator tied to ground truth in git.

How to try it

Complete a session report with explicit file-level claims, merge the PR, and inspect the calibration rollup for that assignment lineage.
Compare scores across models or prompt templates when running controlled repeats.

Shipped in 1dad922 — feat(DP-75): empirical calibration_score from session-report claims vs PR diff (#52)