Sample report · synthetic manuscript · for demonstration only
RigorMD Validation Report
Thirty-day outcomes after sleeve gastrectomy versus Roux-en-Y gastric bypass in patients with BMI ≥ 50
A single-center retrospective cohort study
Design · Retrospective cohortClaim · Comparative (equivalence-leaning)Guideline · STROBE
RM-2026-0142
generated 2026-06-08
report v1.0
2 engines + deterministic
overall · CRITICAL

§00 Executive summary

This manuscript compares 30-day outcomes between sleeve gastrectomy and Roux-en-Y gastric bypass and concludes the two procedures are “equally safe.” That conclusion is not supported by the design. The study is a single-center retrospective cohort with non-randomized allocation; it can describe an association but cannot establish safety equivalence, and no equivalence framework or margin is specified.

Two issues are central to the claim and should be resolved before submission: a statistical-consistency error (a reported significant p-value that does not reconcile with the data, and Table 2 denominators that exceed the analytic N), and a design–claim mismatch (equivalence language from a study not designed or powered for equivalence). Four further findings are summarized below.

§01 Claim map

What the manuscript states, against what the evidence can support.

Stated claim
“Sleeve gastrectomy is equally safe as gastric bypass in the super-obese.”
Supportable claim
“In this single center, 30-day complication rates were similar; residual confounding precludes a safety-equivalence conclusion.”

§02 Domain severity scorecard

Six-domain assessmentconsensus of 2 engines
DomainSeverityPrincipal finding
01Design / claim fitSeriousEquivalence language from a non-randomized retrospective cohort.
02Results / conclusion alignmentModerateAbstract reframes a null secondary endpoint as “comparable.”
03Statistical appropriatenessSeriousNo multiplicity adjustment; no a-priori equivalence power.
04Reporting guideline adherenceMildSTROBE items 9 and 12c partially addressed.
05Numerical / statistical consistencyCriticalTable 2 denominators exceed analytic N; p-value discordant.
06Clinical interpretability / verdictModerateAssociation plausible; safety equivalence is not.
Overall severity Critical2 central findings · revise before submission

§03 Major findings

SeverityDomainFindingEvidenceLocus
Critical05 · ConsistencyReported p = 0.04 does not reconcile with the table (recomputed 0.078); Table 2 denominators exceed the stated analytic N.DeterministicCentral
Serious01 · DesignSafety-equivalence claim made from a non-randomized retrospective cohort with no equivalence margin.QuoteCentral
Serious03 · StatisticsSix co-primary comparisons tested without multiplicity control; study not powered for equivalence.QuoteCentral
Moderate02 · AlignmentAbstract describes a non-significant secondary endpoint as “comparable,” implying a positive finding.QuotePeripheral
Mild04 · ReportingMissing-data handling and sources of bias not described per STROBE 12c / 9.ChecklistPeripheral

§04 Detailed domain review

01

Design / claim fit

Serious
Finding
A safety-equivalence conclusion is drawn from a single-center retrospective cohort with treatment selected by surgeon and patient. No equivalence margin is pre-specified.
Bias direction
Confounding by indication — higher-risk patients are plausibly steered toward sleeve, biasing toward apparent equivalence.
Clinical consequence
A reader could conclude the procedures are interchangeable for BMI ≥ 50 when the data cannot support that.
Evidence
“…demonstrating that sleeve gastrectomy is equally safe…” (Discussion ¶1)
05

Numerical / statistical consistency

Critical
Finding
The reported readmission p-value (0.04) does not reconcile with the cell counts (recomputed 0.078). Separately, Table 2 row denominators (n=214, n=205) exceed the analytic cohort (N=410).
Bias direction
Toward overstating significance and inflating subgroup sizes.
Clinical consequence
The central comparison may be reported as significant when it is not; subgroup rates are not trustworthy as printed.
Evidence
See §05 — Forensic checks.

§05 Forensic checks

Recomputed directly from the manuscript's reported values.

Quoted — Results ¶3
“…no significant difference in 30-day readmission (p = 0.04).”
Table 3 · sleeve (n=212) vs bypass (n=198)
Recomputed
test
Pearson χ²
statistic
3.10
df
1
reported p
0.04
recomputed p
0.078
Discordant — and prose says “no difference”
Quoted — Table 2
Complication rates by group: 214 / 205 patients across two arms.
Analytic cohort stated as N = 410
Denominator check
stated N
410
Table 2 sum
419
difference
+9
Denominators exceed analytic N

§06 Revision priority

In order. Address the central findings first.

  1. Reconcile the readmission p-value and the Table 2 denominators against the source data.Critical
  2. Reframe the conclusion from “equally safe” to a confounding-limited similarity, or add a pre-specified equivalence analysis.Serious
  3. Declare a primary endpoint and apply multiplicity control across the remaining comparisons.Serious
  4. Revise the abstract so the secondary endpoint is described as non-significant.Moderate
  5. Add STROBE-compliant missing-data and bias reporting.Mild

§07 Suggested language revisions

Calibrated to what the design supports — your wording, corrected.

✕ As written

“This study demonstrates that sleeve gastrectomy is equally safe as gastric bypass.”

✓ Supportable

“In this single-center cohort, 30-day complication rates were similar between procedures; residual confounding limits causal interpretation.”

✕ As written

“Readmission was comparable between groups.”

✓ Supportable

“Readmission did not differ significantly; the study was not powered to establish equivalence.”

§08 Technical appendix

What could be checked from the submitted files — and what could not.

Checked from submitted files

  • ✓ passed Reference list internal consistency
  • ⚑ flag Reported p-value vs recomputed
  • ⚑ flag Table denominators vs analytic N
  • ✓ passed Abstract / results numeric agreement (primary)
  • ⚑ flag Multiplicity across comparisons

Not checkable from submitted files

  • — n/a Patient-level reanalysis (no dataset provided)
  • — n/a Propensity overlap / balance (no covariate table)
  • — n/a Missing-data mechanism (not reported)
  • — n/a Figure source values (raster only)
Scope. This report provides methodological and statistical guidance based on the submitted materials. It does not guarantee publication, replace peer review, certify research validity, or provide clinical treatment advice. Findings marked deterministicare recomputed from the manuscript's own reported values; findings marked quote are traceable to the quoted text. Machine-readable outputs (consensus JSON, statistics-check JSON) accompany this report.