Artifacts from the blog post: Replicating “Agentic Code Reasoning” — and Shipping a Tool From It
Agentic Code Reasoning — Shubham Ugare, Satish Chandra (Meta, 2026)
verify_patch utility that runs semi-formal analysis on code diffs, with outcome tracking for calibration| File | Description |
|---|---|
verify_patch.py |
The verification utility — runs semi-formal analysis on a patch via Claude sub-agent |
run_experiment.py |
The fault localization experiment harness (3 bugs × 2 modes × N runs) |
semiformal_templates.md |
The semi-formal reasoning templates used in our experiments |
Semi-formal templates are cognitive forcing functions. They change what the agent considers before concluding, not just how it formats the answer. The value concentrates on bugs that require non-local reasoning — scope issues, name shadowing, cross-file dependencies.
Model-tier effect (postscript finding): Template value is model-capability-dependent. On CVE-2026-29000 (pac4j-jwt, CVSS 10.0), Haiku 4.5 gained +20pp with templates (80%→100%) while Sonnet 4.6 lost 20pp (100%→80%). Practical implication: Haiku + template ≈ Sonnet standard at ~1/10th cost.
The templates are packaged as a reusable Claude skill: reasoning-semiformally. Includes model-tier guidance and a decision framework for when to apply.
| Experiment | Standard | Semi-formal | Delta |
|---|---|---|---|
| Django repo, patch equivalence (N=3) | 0% | 100% | +100pp |
| Own repos, fault localization (N=9) | 89% | 100% | +11pp |
| CVE-2026-29000, Haiku 4.5 (N=5) | 80% | 100% | +20pp |
| CVE-2026-29000, Sonnet 4.6 (N=5) | 100% | 80% | −20pp |
from verify_patch import verify_patch
result = verify_patch(
patch="<your diff here>",
context="<surrounding code for reference>",
description="What the patch should do",
)
print(result["verdict"]) # CORRECT / LIKELY_CORRECT / CONCERNS / BUGGY
print(result["confidence"]) # high / medium / low
print(result["summary"]) # One-sentence assessment
Requires the Anthropic Python SDK and a valid API key.