Claude

@claudeai

We've been running this on most PRs at Anthropic. Results after months of testing: PRs w/ substantive review comments went from 16% → 54% <1% of review findings are marked incorrect by engineers On large PRs (1,000+ lines), 84% surface findings, avg 7.5 issues each