Case study ·

Case study: an imprecise null read as a finding

A walkthrough of a real, de-identified RigorMD report on a multi-hospital colectomy cohort. The study is careful and its arithmetic is clean — the finding is about how confidently a wide-interval null gets stated.

§01 The study

The manuscript is a retrospective cohort of 5,831 colectomies performed by 194 surgeons across 31 hospitals, asking what an enterprise “value” dashboard can honestly tell a service line. Its central message is a cautious negative: the routine measures — operating time, supply cost, complications, length of stay, conversion — do not move together, and higher disposable supply cost is not associated with fewer short-term adverse events. It is a well-hedged thesis, matched to an observational design, and the numbers are internally consistent throughout.

The engine graded it moderate overall — not because anything is fabricated or miscalculated, but because the central conclusion is stated a little more confidently than the evidence supports. This is the most instructive kind of report: the gap is framing, not error.

§02 The central finding: an imprecise null read as a demonstrated dissociation

The load-bearing negative rests on results like reoperation OR 1.03 (95% CI 0.76–1.38, p = 0.87). That interval is wide: it still admits clinically meaningful effects in both directions. Reported with no pre-specified equivalence margin, “not associated” overstates what is really a “none detected.” The study did not show that cost and outcomes are independent; it failed to detect a link it may have been underpowered to find.

This is the single most common pattern in careful observational work — reading absence of evidence as evidence of absence — and the fix is wording plus a stated minimum detectable effect, not a re-run. We cover the general case in why manuscripts get rejected for statistics.

§03 The second finding: a cost–outcome claim without case-complexity adjustment

The cost–outcome comparison adjusts for age, sex, BMI, stoma, division, and year — but diagnosis, disease severity, ASA class, and comorbidity were unavailable. Disposable cost is a proxy for how complex a case is, so the costlier cases are plausibly the harder ones. That is confounding by indication, and it pulls the cost–outcome association toward the null — it can hide a real relationship. The authors disclose the limitation; the engine's point is that it bears directly on the headline negative. This is exactly the STROBE item on unmeasured confounders, worked through in the STROBE walkthrough.

§04 What the deterministic layer found: nothing

The forensic pass recomputed the reported odds-ratio, CI, and p-value triplets and reconciled the cohort counts — the operative-approach subgroups (robotic 1,799, laparoscopic 2,216, converted 313, open 1,503) sum exactly to the analytic N of 5,831. No numerical inconsistencies. This is the point worth sitting with: a study can pass every arithmetic check cleanly and still earn a moderate grade, because consistency and calibration are different questions. The machine that recomputes the numbers is described in how journals catch statistical errors.

§05 Why moderate, and not more

A tool that shouted “critical” at a careful, well-disclosed study would be miscalibrated. The conclusions are hedged, the limitations are named, and the fixes are mostly wording — reframe the null, add a minimum detectable effect, add a sensitivity analysis for the missing length-of-stay and readmission data, report intervals for the surgeon-level estimates. The severity reflects the size of the gap between what was claimed and what the design can carry, and here that gap is real but modest. Sizing a finding correctly is as much of the job as catching it.

Read the full report — scorecard, forensic checks, and before/after language →, see how the engine works, or review pricing — the pre-submission review is $25. For the contrasting case where a finding is genuinely serious, see the PE risk-model case study.

How to read this. This walkthrough describes a de-identified report for demonstration. RigorMD flags methodological and statistical issues for your judgment; it does not certify a manuscript, replace peer review, or replace a statistician's input on study design.