Article ·

How journals catch statistical errors

A manuscript passes through several independent screens before it is accepted. Knowing what each one looks for lets you run them on yourself first — while the errors are still cheap to fix.

§01 The editorial screen (minutes)

The first check is the fastest and the one most authors underestimate. Before a manuscript reaches a reviewer, an editor reads the abstract and skims the methods, asking a few blunt questions: does the design support the claim, is the reporting guideline satisfied, is the primary endpoint stated, do the headline numbers look internally plausible. Journals publish this list — one editorial “top 10 reasons your manuscript may be rejected without review” ↗ puts inadequate design and incomplete methodological reporting near the top. A statistical problem visible in the abstract is caught here, in minutes, with no reviewer involved. We cover this screen in detail in why editors desk-reject before peer review.

§02 Automated consistency checkers (seconds)

A growing number of journals run manuscripts through software that recomputes reported numbers. Two are well known. statcheck re-derives each reported p-value from its test statistic and degrees of freedom and flags any that disagree; when it was run across eight psychology journals over 1985–2013, about half the papers had at least one inconsistent p-value and one in eight a gross one ↗. GRIM checks whether a reported mean of integer data is arithmetically possible for the stated sample size; in its founding study, about half of the analyzable articles contained at least one impossible mean ↗.

These checks are deterministic — they pass or they fail, and there is no arguing with the arithmetic. They are also blind to everything except consistency: they do not know whether the right test was chosen or the claim was earned. We break down each one in when a p-value doesn't match its test statistic and the GRIM test.

§03 Reporting-guideline audits (item by item)

Many journals require a completed reporting checklist at submission and check the manuscript against it: CONSORT for randomized trials, STROBE for observational studies, TRIPOD for prediction models, PRISMA for systematic reviews. A missing flow diagram, an unaccounted-for dropout, an unstated handling of missing data — each is a concrete, citable reason to return the paper, and because the checklist is published in advance, it is the most preventable rejection there is. Walk your design's checklist before submission: our STROBE, CONSORT, TRIPOD, and PRISMA walkthroughs take them item by item.

§04 Reviewer scrutiny and post-publication forensics

A methodological reviewer goes past consistency to appropriateness: power and precision, multiplicity, missing-data handling, confounding, whether the conclusion outruns the result. And the scrutiny does not stop at acceptance. Statistical detectives audit the published record too — John Carlisle's survey of baseline data in 5,087 randomized trials ↗ found a small percentage whose baseline distributions were too perfect or too divergent to be consistent with real randomization, and that method helped prompt the reanalysis of a landmark trial. Errors caught at this stage cost corrections and retractions; biomedical retractions have roughly quadrupled over twenty years ↗. We walk through one such case in catching what the record already knew.

§05 Running every screen before you submit

Each of these screens is something you can run on your own manuscript first. The editorial questions you can ask yourself; the consistency checks you can recompute; the reporting checklist is published; the appropriateness questions are the ones a careful reviewer asks. Doing it before submission turns a months-long rejection round into an afternoon.

A structured pre-submission review runs them together. RigorMD appraises a manuscript with two independent engines and a deterministic forensic layer that recomputes statistics the way statcheck and GRIM do, checks the design's reporting guideline item by item, and returns a severity-scored report grounded in your own quotes — across all six domains, from numerical consistency to whether the conclusion is calibrated to the evidence. It flags; it does not certify, and it is not a substitute for peer review or a statistician's input on design. See a full sample report →, read how the engine works, or review pricing — the pre-submission review is $25.

How to read this. These are the screens a manuscript passes through, described so you can run them yourself. RigorMD flags methodological and statistical issues for your judgment; it does not certify a manuscript, replace peer review, or replace a statistician's input on study design.