The authors state the counseling-utility claim with moderate confidence; the evidence warrants low certainty (GRADE) — the benchmarks come from one health system, are internally validated only, and rest on a follow-up-captured cohort (55% by three years) whose non-returners were younger, heavier, and less often female, so the absolute bands probably run optimistic. The study is careful, well hedged, and every deterministic check is clean; the gap is one of reach, not error.
This manuscript takes on the one question every bariatric patient asks — how much weight will I lose? Across 2,730 primary sleeve, bypass, and duodenal switch/SADI patients in a five-campus system, it replaces the usual single average with percentile benchmarks — the median, the middle half, and the 10th-to-90th band — for each operation out to three years, using quantile regression with bootstrap intervals. The message (counsel with a range, not a number) is well matched to the design, the procedure ordering is clinically coherent, and every reported number is internally clean.
Two issues bear on how far the absolute benchmarks can be pushed. First, they rest on a follow-up-captured cohort — capture falls to 55% by three years, and the patients who dropped out were younger, heavier, and less often female (the profile that loses less), so the two- and three-year bands probably run optimistic; the authors weighting-tested the between-procedure differences but not the absolute bands. Second, because each patient contributes one weight, the time curve is stitched across calendar eras during the GLP-1 boom — an effect disclosed but not quantified. A seventh domain (contribution and literature positioning) separately flags that the percentile-benchmark method itself is not new — retrieved prior studies build procedure-adjusted weight-loss percentile charts by quantile regression, uncited here — though the DS/SADI-inclusive, contemporary US contribution is real. The deterministic forensic layer found no numerical inconsistencies, and all fifteen references resolve.
What the manuscript states, against what the evidence can support.
| Domain | Severity | Principal finding | |
|---|---|---|---|
| 01 | Design / claim fit | Moderate | Weight-loss “time” is modeled between patients across 2021–2026 eras; the GLP-1-era shift is disclosed but not quantified. |
| 02 | Results / conclusion alignment | Moderate | Counseling-ready framing sits in the abstract; the external-validation caveat is confined to the Limitations. |
| 03 | Statistical appropriateness | Moderate | Informative loss to follow-up (82%→55%); only the between-procedure differences were weighting-tested, not the absolute bands. |
| 04 | Reporting guideline adherence | Mild | STROBE flow complete; the procedure-by-time interaction F is reported without degrees of freedom. |
| 05 | Numerical / statistical consistency | No findings | 14 deterministic checks pass; subgroup counts, percentages, and capture proportions reconcile; all 15 references resolve. |
| 06 | Clinical interpretability / verdict | Moderate | Sound, well-hedged benchmarking; the residual is optimistic absolute bands pending attrition-weighting and external validation. |
| 07 | Contribution & literature positioning | Moderate | The percentile-benchmark method itself is not new — retrieved prior quantile-regression percentile-chart studies go uncited; the DS/SADI-inclusive contribution is real, but the primacy framing outruns the retrieved record. |
Language calibration: 1 must-change wording · 3 precision polish. Analytic work: 2 need source-data or analytic work.
| Severity | Domain | Finding | Author action | Evidence | Locus |
|---|---|---|---|---|---|
| Moderate | 03 · Statistics | Benchmarks condition on patients still in follow-up (capture 82%→55% by three years; non-returners younger, heavier, less often female); only the between-procedure differences were weighting-tested, so the absolute bands may run optimistic. | New analysis needed | Quote | Central |
| Moderate | 01 · Design | One weight per patient means the “time” curve is built between patients across 2021–2026 eras (later points from earlier surgeries), confounding postoperative time with the GLP-1-era secular trend the authors could not adjust for. | New analysis needed | Quote | Central |
| Moderate | 03 · Statistics | Duodenal switch and SADI — two different operations — are pooled into the smallest group (271; 79 SADI), giving the widest, least-stable bands and a non-significant early separation from sleeve. | Statistical precision | Quote | Peripheral |
| Moderate | 02 · Alignment | The counseling-ready framing (“what weight loss really looks like”) sits in the abstract while the single-system / external-validation caveat is confined to the Limitations. | Must-change wording | Quote | Peripheral |
| Mild | 03 · Statistics | Many between-procedure contrasts, percentiles, and thresholds are tested without multiplicity adjustment; the large gaps hold, but borderline contrasts (12-mo DS/SADI vs RYGB, P=0.028) are exploratory. | Statistical precision | Quote | Peripheral |
| Mild | 04 · Reporting | The procedure-by-time interaction that licenses the curves is reported as F=24.11, P<0.001 without its degrees of freedom, so the test cannot be reconstructed. | Statistical precision | Quote | Peripheral |
These curves aren’t one patient followed for three years — they stitch together different patients measured at different times, and the three-year points come from people operated on years earlier, during the run-up in weight-loss drugs. Clinically, read the curve as a snapshot of the program’s experience by era, not a promise of how one patient will track; the later-year bands especially carry an era effect that is acknowledged but not quantified.
Single-weight cross-section: postoperative time is a between-patient covariate confounded with operative-year era (right-censoring by the April-2026 download). Unadjusted secular trend (GLP-1/GIP uptake) loads onto the time axis. ROBINS-I: confounding. Disclosed by the authors; the residual is that the synthesized threat to the time axis is not quantified.
Fix is analytic, not fatal: stratify or adjust by operative year and show whether the procedure-by-time curves and the 36-month bands move.
About half of patients had no weight recorded at three years, and the ones who skipped follow-up were the ones who tend to lose less — so the benchmark you’d quote a patient at two or three years is probably a little rosier than the truth, and the gap is widest exactly where follow-up is thinnest. Clinically, this is the number that goes into counseling, so treat the later-year bands as best-case pending a follow-up-weighted re-analysis.
Informative loss to follow-up (differential attrition: capture 82.1%→55.1%; non-returners younger / higher-BMI / less-female). IPW sensitivity is reported for the contrasts only; the marginal conditional-quantile benchmarks are not shown robust to missingness. ROBINS-I: selection of participants. This is the dominant finding.
Fix is real analytic work: re-estimate the absolute percentile bands under inverse-probability-of-follow-up weighting and report how far the 24- and 36-month medians and deciles shift.
Recomputed directly from the manuscript’s reported values — no numerical inconsistencies were found.
RYGB − SG at 6 months: 3.0 percentage points (95% CI, 1.2 to 4.8; P = 0.001).Difference in predicted median %TBWL with 95% CI and p-value
Procedure counts: sleeve 1,796, bypass 663, DS/SADI 271.Cohort stated as N = 2,730
36 months, 698 of 1,267 (55.1%).Reported capture proportion
In order. The two text edits are cheap; the follow-up-weighting re-analysis speaks to the study’s central question.
Suggested wording is triaged by author action. Some wording overstates the evidence and should change; some is recommended risk reduction; some is precision polish; some is left to author discretion.
“Percentile benchmarks allow a surgeon to show patients what weight loss really looks like after each operation.”
“Within this five-campus system, percentile benchmarks describe the distribution of weight loss for each operation; because they are internally validated only and rest on a follow-up-captured cohort, they should be recalibrated and externally validated before use elsewhere.”
“…no study has provided percentile-based weight-loss benchmarks across SG, RYGB, and DS/SADI in a single adjusted model.”
“Procedure-adjusted weight-loss percentile charts have been reported for sleeve and bypass; we extend that approach to DS/SADI in a contemporary multi-site US cohort through 36 months.”
“DS/SADI led SG by 8.9 and 12.5 points.”
“DS/SADI led SG by 8.9 and 12.5 points — on the smallest group (271; 79 SADI) with the widest intervals; interpret the DS/SADI bands, especially the tails, as approximate.”
“…the ordering held across sensitivity analyses.”
“…the procedure ordering held across sensitivity analyses.” The sensitivity analyses confirm the between-procedure differences, not the absolute bands. Well-judged as written; this is a clarity note, not a required change.
Items observable from the extract. The full SOARD pre-submission gate was not run for this sample; these do not affect the methodological grade above.
Cited DOI and PMID identifiers, resolved against the public registries — Crossref, the DOI handle registry, and PubMed — as of 2026-07-03. A ✓ means the registry record exists and is consistent with the citation as printed; it does not assess whether the cited work supports the claim it is attached to. An identifier the check could not reach is listed as not checked, never assumed to resolve. Problems found here also appear as findings above. All 15 cited identifiers were checked: 15 resolve.
| Identifier | Outcome | Registry | Notes |
|---|---|---|---|
| DOI 10.7326/M17-2786 | ✓ Resolves | Crossref | |
| DOI 10.1001/jamasurg.2020.5666 | ✓ Resolves | Crossref | |
| DOI 10.1056/NEJMoa2206038 | ✓ Resolves | Crossref | |
| DOI 10.1007/s00464-025-12170-w | ✓ Resolves | Crossref | |
| DOI 10.1016/S2213-8587(25)00226-8 | ✓ Resolves | Crossref |
Findings — see below
The prior literature RigorMD retrieved into an evidence pack and compared with this manuscript's positioning as of 2026-07-03. This is a positioning-risk check, not a novelty score: it flags where a claim may overlap, understate, or be contradicted by retrieved prior work. It never certifies that a contribution is novel or first — a clean result means only that no directly overlapping prior study was found in this evidence pack. The retrieval is bounded and time-stamped; treat it as a starting point for your own literature review, not a replacement for it. Positioning risks found here also appear as findings above. Retrieved prior studies build procedure-adjusted weight-loss percentile charts by quantile regression — the manuscript's own core method — but are not engaged; the framing that percentile-based benchmarks were previously unavailable is stronger than the retrieved record supports.
Priors we compared you against. The prior work RigorMD retrieved and compared this manuscript against — disclosed so you can see the evidence pack behind the assessment. Listing a work here is not an instruction to cite it; it is the basis on which the positioning was checked.
| Prior work | Year | Identifier | In your references |
|---|---|---|---|
| Centile Charts for Monitoring of Weight Loss Trajectories After Bariatric Surgery in Asian Patients | 2021 | PMID 34363141 | Not in your references |
| Prediction Model for Chronological Weight Loss After Bariatric Surgery in Korean Patients | 2024 | PMID 38974892 | Not in your references |
| Weight-Independent Percentile Chart of 2880 Gastric Bypass Patients: a New Look at Bariatric Weight Loss Results | 2016 | PMID 27138602 | Not in your references |
The manuscript positions procedure-specific percentile weight-loss benchmarks as previously unavailable, but RigorMD retrieved prior studies that build procedure-adjusted weight-loss percentile charts by quantile regression — the same core method — for sleeve and bypass, plus earlier single-procedure percentile charts. The genuine, defensible contribution is the addition of duodenal switch/SADI and a contemporary multi-site US cohort through 36 months; the claim of primacy for percentile-based benchmarking itself is stronger than the retrieved record supports, and none of these methodological priors are cited.
“Longitudinal centile lines were plotted using the post-estimation predictions of quantile regression models, adjusted for type of procedure, sex, ethnicity, and baseline BMI.”From the prior work RigorMD retrieved and compared
Literature assessed as of 2026-07-03. Bounded PubMed retrieval on the manuscript's own concept pair, not a systematic review; a work not surfaced here was not necessarily absent from the literature. Listing a prior is disclosure of what was compared, not an instruction to cite it.
What could be checked from the submitted files — and what could not. Check your own paper →
The same two-engine, severity-scored review with deterministic checks — $30 per report; most return within hours.
Tested against the public record — the PREDIMED concordance benchmark →