Sample report · real appraisal, de-identified manuscript · for demonstration only
Two real samples:Colectomy cohort · ModeratePE risk model · Serious
RigorMD Validation Report

A multidimensional dynamic pulmonary embolism risk model for surgical patients: development and temporal validation

A clinical prediction-model study in a national surgical registry (4.8 million operations)
Design · Prediction model (development + temporal validation)Claim · PredictionGuideline · TRIPOD / PROBAST
RM-2026-0191
generated 2026-06-23
report v1.0
2 engines + deterministic
overall · SERIOUS
Conclusion calibration

A strong, well-reported model whose central claim outruns the evidence

The authors state the dynamic-reassessment claim with moderate confidence; the evidence warrants low certainty (GRADE). The headline accuracy gain (AUC 0.811 → 0.892) is substantially reverse causation — unplanned readmission and reoperation are often the occasion on which the embolism is diagnosed — and a public calculator is deployed on temporal-only validation. The model is rigorously built and exceptionally well reported; the gap is in the central inference, not the execution.

01 Design / claim fitSerious
02 Results / conclusion alignmentModerate
03 Statistical appropriatenessModerate
04 Reporting guideline adherenceExemplary
05 Numerical / statistical consistencyClean
06 Clinical interpretability / verdictSerious

§00 Executive summary

This manuscript develops a calculator to tell surgeons which patient, at which moment, is at high enough risk of pulmonary embolism (PE) to justify extended blood-thinner prophylaxis — and to update that risk after an unplanned readmission or reoperation. Built and temporally validated in 4.8 million operations, the discharge model discriminates well (AUC 0.811) with excellent calibration, and the reporting is exemplary (full TRIPOD, released coefficients and code, a working calculator). The numbers reconcile throughout.

The serious rating is confined to the central claim. The headline jump to AUC 0.892 from adding unplanned readmission and reoperation is substantially reverse causation: those returns are frequently when the PE is diagnosed (suspected-clot readmissions were 68% PE), and the registry cannot order the events. This is compounded by a calculator deployed on temporal-only validation (no external validation or impact study) and a model that predicts risk under current mixed prophylaxis, not the untreated risk a “treat more?” decision needs. All three are disclosed, and pre-specified mitigations bound them — which is why this is serious, not fatal.

§01 Claim map

What the manuscript states, against what the evidence can support.

Stated claim
“An integrated dynamic calculator estimates PE risk at discharge and updates it after an unplanned return, offering a practical strategy to target extended prophylaxis while prevention is still possible.”
Supportable claim
“The model identifies higher-risk patients across broad surgery, and an unplanned return marks an elevated-risk state worth reassessing; much of the dynamic accuracy gain reflects PE being diagnosed at the return, so it is a risk marker, not a validated prevention strategy.”

§02 Domain severity scorecard

Six-domain assessmentconsensus of 2 engines
DomainSeverityPrincipal finding
01Design / claim fitSeriousDynamic-model accuracy gain is substantially reverse causation; deployed on temporal-only validation.
02Results / conclusion alignmentModerateA modest absolute yield (1 PE per 34 flagged) framed as a large “dynamic” advance.
03Statistical appropriatenessModeratePredicts risk under unmeasured mixed prophylaxis, not the untreated risk a treatment decision needs.
04Reporting guideline adherenceExemplaryFull TRIPOD: locked pipeline, calibration, decision curve, released coefficients and code.
05Numerical / statistical consistencyCleanReported CIs, cohort counts, and the reclassification table all reconcile.
06Clinical interpretability / verdictSeriousStrong, well-reported model; the central dynamic-reassessment claim outruns the evidence.
Overall severity Serious2 central design findings · strong model, well reported

§03 Major findings

Language calibration: 1 must-change wording · 1 precision polish. Analytic work: 2 need source-data or analytic work · 1 claim limitation.

SeverityDomainFindingAuthor actionEvidenceLocus
Serious01 · DesignThe dynamic model’s accuracy gain (AUC 0.811 → 0.892) is substantially reverse causation: unplanned readmission/reoperation are often the occasion of PE diagnosis (suspected-clot returns 68% PE), with no temporal ordering in the registry.New analysis neededQuoteCentral
Moderate01 · DesignA public calculator is deployed for clinical use on temporal-only (internal) validation — no external validation in independent data and no prospective impact study.New analysis neededQuoteCentral
Moderate02 · AlignmentThe “multidimensional dynamic … practical strategy” framing overstates a modest absolute yield (PPV 2.92%; about 1 PE per 34 flagged) under a usual-care estimand.Must-change wordingQuoteCentral
Moderate03 · StatisticsThe registry records no anticoagulant exposure, so the model predicts PE under current mixed prophylaxis — not the untreated risk relevant to deciding who needs more.Claim limitationQuoteCentral
Mild03 · StatisticsComplete-case analysis excludes operations missing a core complexity variable (wRVU) without imputation; the excluded count is only in the supplement.Statistical precisionQuotePeripheral

§04 Detailed domain review

01

Design / claim fit

Serious
Finding
The headline advance — discrimination rising from AUC 0.811 to 0.892 when unplanned readmission and reoperation are added — is substantially driven by reverse causation. Those returns are frequently the occasion on which the PE is diagnosed (suspected-clot readmissions were 68% PE), and the registry records the return and the PE in the same 30-day window without ordering them.
Bias direction
Inflates the incremental accuracy and apparent preventive value of dynamic reassessment — concurrent ascertainment of PE at the return makes the second stage look more predictive than it is.
Evidence
“adding unplanned readmission and reoperation increased the AUC to 0.892” (Results)

The big accuracy jump when a readmission or reoperation is added is partly circular: those returns are often when the clot is found. So a return flags higher measured risk, but it does not guarantee a clean window to prevent the clot, because for some flagged patients the clot is already there. Clinically, treat any unplanned return as a prompt to re-check prophylaxis, but do not read 0.892 as proof of a prevention opportunity.

For your statistician

Reverse causation / outcome-concurrent predictor (PROBAST: outcome). The dynamic stage adds returns recorded in the same 30-day window as PE, with no guaranteed precedence. Authors mitigate (remove suspected-clot subgroup; re-anchor follow-up, HR 1.08 [0.81–1.44]; discard PEs within 1–7 days of the event, AUC 0.866 → 0.845), which shows a real residual signal but cannot order events in the registry.

Adversarial self-check: upheld at serious — the residual signal argues against “fatal,” but the deployable headline remains 0.892 and the increment is the paper’s central contribution.

Named bias
reverse causation · PROBAST: outcome / predictor
GRADE
risk of bias · remedy: re-estimate on PEs diagnosed strictly after the return
03

Statistical appropriateness

Moderate
Finding
The registry captures no anticoagulant exposure, and roughly a third of comparable patients receive extended prophylaxis, so the model predicts PE under current mixed treatment — not the untreated risk that a “should I treat?” decision turns on.
Bias direction
Estimand mismatch — the flagged group’s observed PE rate is already partly post-prophylaxis, so the counterfactual benefit of adding prophylaxis is not identified.
Evidence
“…estimates PE risk under contemporary real-world care rather than untreated biologic risk.” (Methods)

The database does not record who already got blood thinners, so the risk numbers reflect care as delivered, not an untreated patient. Clinically, the score tells you who is high-risk now, not how much an extra dose would help them.

For your statistician

Prevalent-treatment / usual-care estimand (PROBAST: predictors/outcome). No anticoagulant capture; ~31% background prophylaxis; predicted P(PE | current care) ≠ untreated risk or treatment benefit. Disclosed; causal claims limited.

Named bias
prevalent-treatment estimand · usual-care vs counterfactual
GRADE
indirectness · remedy: exposure-aware data or a trial for the counterfactual

§05 Forensic checks

Recomputed directly from the manuscript’s reported values — this study passed.

Quoted — Results (timing analysis)
“…early versus late events no longer differed in PE risk (hazard ratio 1.08; 95% CI, 0.81–1.44; p = 0.58).”
Hazard ratio with 95% CI and p-value
Recomputed (CI ↔ p)
implied z
0.52
reported p
0.58
recomputed p
≈ 0.60
CI symmetry
consistent (log scale)
Consistent — CI, p-value, and point estimate agree
Quoted — Table 2 (reclassification)
Reclassification groups sum to 57,840 unplanned-return patients; PE events sum to 1,742.
Concordant low + reclassified up/down + concordant high
Table arithmetic
row sum (n)
57,840
row sum (PE)
1,742
NNE = 1/PPV
34.2 ✓ (2.92%)
Consistent — cells, counts, rates, and NNE reconcile

§06 Revision priority

In order. The central design finding governs the headline.

  1. Re-estimate the dynamic model’s added value using only PEs diagnosed strictly after the unplanned return (not concurrent with it), and report that ordered figure as the deployable one.Serious
  2. Validate the model externally in an independent health system, and label the public calculator investigational pending a prospective impact study of prophylaxis use, PE, and bleeding.Moderate
  3. State that the predicted quantity is PE risk under current mixed prophylaxis (not untreated risk), and keep all targeting language hypothesis-generating.Moderate
  4. Report the absolute prevented-PE trade-off (given ~60% prophylaxis efficacy and bleeding cost) next to the headline, and temper the “multidimensional dynamic … practical strategy” branding.Moderate
  5. Report the excluded-case counts in the main text with a multiple-imputation or missing-indicator sensitivity check for the missing complexity variable.Mild

§07 Language calibration

Suggested wording is triaged by author action. Some wording overstates the evidence and should change; some is recommended risk reduction; some is precision polish; some is left to author discretion.

As written

“Dynamic reassessment offers a practical strategy to reduce missed opportunities for prevention after discharge.”

Must change Must-change wording

“An unplanned readmission or reoperation marks a higher-risk state warranting reassessment; because returns are often when PE is diagnosed, the model identifies risk rather than a quantified prevention opportunity.”

The current wording makes a claim the design or results cannot support.
As written

“A multidimensional dynamic calculator.”

Recommended Recommended wording

“A two-stage risk model: a discharge estimate plus an update after an unplanned readmission or reoperation.”

The wording is directionally defensible, but softer wording would reduce reviewer risk.
As written

“Reassessment moved 42% of unplanned-return patients above the treatment threshold.”

Statistical precision Statistical precision

“Reassessment moved 42% above threshold (PPV 2.92%; about one PE per 34 flagged); state the absolute prevented-PE trade-off given ~60% prophylaxis efficacy and bleeding risk.”

The sentence is acceptable, but could be made more statistically exact.
As written

“The model is publicly accessible as a calculator.”

Author discretion Author discretion

“The model is available as an investigational calculator pending external validation.” Defensible either way; this flags the framing while external evidence is pending.

A conservative phrasing option; the current wording is defensible.

§08 Journal compliance

Checked against RigorMD’s journal registry. Compliance items reflect the journal’s formatting and submission rules; they do not affect the methodological severity grade above.

§09 Reference identifiers

Cited DOI and PMID identifiers, resolved against the public registries — Crossref, the DOI handle registry, and PubMed — as of 2026-06-23. A ✓ means the registry record exists and is consistent with the citation as printed; it does not assess whether the cited work supports the claim it is attached to. An identifier the check could not reach is listed as not checked, never assumed to resolve. Problems found here also appear as findings above. 4 of 5 cited identifiers were checked: 1 resolves to a different work · 3 resolve · 1 not checked.

IdentifierOutcomeRegistryNotes
DOI 10.1097/SLA.0000000000005821✗ Resolves to a different workCrossrefDOI 10.1097/SLA.0000000000005821 resolves at Crossref to "Extended prophylaxis after pelvic surgery" (2021), which does not match the citation as printed (checked 2026-06-23)
DOI 10.1016/j.jvsv.2022.05.214— Not checkedThe registry could not be reached.
DOI 10.1056/NEJMoa012385✓ ResolvesCrossref
PMID 31626288✓ ResolvesPubMed
DOI 10.1101/2020.06.21.20136432✓ ResolvesCrossref

§10 Technical appendix

What could be checked from the submitted files — and what could not.

Checked from submitted files

  • ✓ passed Reported CI / p-value consistency (timing HR, early-event OR)
  • ✓ passed Cohort split and PE incidence (0.35%) reconcile
  • ✓ passed Table 2 reclassification cells, PE counts, rates, and NNE
  • ✓ passed Abstract / results numeric agreement
  • ⚑ flag Incremental AUC sensitive to reverse causation (see §04)

Not checkable from submitted files

  • — n/a Temporal sequence of the unplanned return vs the PE diagnosis (not in the registry file)
  • — n/a External validation in an independent health system
  • — n/a Counterfactual benefit of added prophylaxis (no anticoagulant exposure recorded)
  • — n/a Patient-level model re-fit (no dataset provided)
Scope. This report provides methodological and statistical guidance based on the submitted materials. It does not guarantee publication, replace peer review, certify research validity, or provide clinical treatment advice. Findings marked deterministic are recomputed from the manuscript's own reported values; findings marked quote are traceable to the quoted text. This sample is a real RigorMD appraisal of a de-identified manuscript; the journal-compliance and reference-identifier sections are illustrative of those checks. Machine-readable outputs (consensus JSON, statistics-check JSON) accompany the full report.