Article · Published June 2026

Why manuscripts get rejected for statistics

The methodological and statistical patterns that draw a rejection — what a reviewer or editor actually sees, a clearly hypothetical example for each, and how to catch it before you submit.

RigorMD Editorial — Reviewed by RigorMD's founding editor — a practicing academic surgeon and surgical journal editor with 61 peer-reviewed publications and extensive editorial-review experience. About RigorMD →

§01 Which rejection do most authors never see coming?

Most authors picture rejection as a verdict that arrives after peer review. A large share of manuscripts never gets that far. In one journal's own accounting, nearly two in five submissions were rejected before review at all — desk-rejected by an editor in minutes, often on grounds the author could have addressed in an afternoon. The editorial “Top 10 reasons your manuscript may be rejected without review” ↗ lists inadequate study design and incomplete methodological reporting among them.

And the cost of a miss compounds. Across biomedical journals, submission to publication runs from roughly 70 to 558 days ↗; every rejection round resets that clock. A statistical flaw caught before submission costs you an afternoon. The same flaw caught by reviewer #2 costs you months.

The patterns below are the ones that recur. None of them require fraud or incompetence — they are ordinary, and they are common. For each, here is what a reviewer sees, a clearly hypothetical example, and a self-check you can run before you submit. A note on what follows: RigorMD flags these patterns; it does not prove their absence. Read them as the questions a careful reviewer asks, not a checklist that guarantees acceptance.

§02 Does the design actually support the claim?

What a reviewer sees. A conclusion that reaches past the study design. An observational cohort that ends in causal language. A single-arm series that implies superiority. An equivalence claim resting on a non-significant p-value from a study never powered to show equivalence. The data may be fine; the claim is simply larger than the design can carry.

A hypothetical case. (Clearly hypothetical.) A retrospective review of 140 patients reports that a new anticoagulation protocol “reduced bleeding,” with no concurrent control group and only a comparison to historical rates. The verb “reduced” asserts cause; the design supports only an association.

Self-check. State your design and your strongest defensible claim in one sentence each, then confirm the second never outruns the first. “Associated with,” “consistent with,” and “in this cohort” are not hedging — they are accuracy. If you need a causal claim, you need a design that earns it.

§03 Was the study underpowered, then overclaimed?

What a reviewer sees. A study too small to detect the effect it discusses, presented as if absence of evidence were evidence of absence. A non-significant secondary endpoint described as “comparable” or “no difference.” Wide confidence intervals that span both meaningful benefit and meaningful harm, summarized as null.

A hypothetical case. (Clearly hypothetical.) A two-arm pilot of 30 patients finds 18% versus 11% complication rates, p = 0.42, and concludes the techniques are “equally safe.” With this sample size, the confidence interval around that difference is wide enough to hide a doubling of risk. The study did not show equivalence; it failed to show anything.

Self-check. For every “no difference,” ask whether you were powered to detect one. Report the confidence interval, not just the p-value, and read it aloud: if the interval includes effects you would not call equivalent, do not call them equivalent. A null result from an underpowered study is a question, not an answer.

§04 Do the numbers reconcile?

What a reviewer sees. Internal arithmetic that does not hold. Percentages that do not match their denominators. A p-value that does not follow from the test statistic and group sizes reported beside it. Subgroup counts that sum to more or less than the total. Numbers in the abstract that differ from the same numbers in a table. Reviewers notice, because a single inconsistency makes them re-check everything.

A hypothetical case. (Clearly hypothetical.) A table reports 47 events in a group described as n = 312, labeled “18.9%.” But 47 of 312 is 15.1%. Either the count, the denominator, or the percentage is wrong — and the reader cannot tell which, so the whole table loses credibility. This is not a rare failure: in one audit of orthopaedic journals, 17% of papers contained a statistical error capable of changing the conclusion ↗.

Self-check. Recompute your headline numbers from raw counts — every proportion from its numerator and denominator, every p-value against its test. Reconcile the abstract against the tables, digit by digit. This is mechanical work, and it is exactly the layer a machine checks fastest.

§05 Are any reporting-guideline items missing?

What a reviewer sees. A study that omits the items its reporting guideline requires — a CONSORT flow diagram absent from a trial, a STROBE-mandated account of how confounders were handled missing from a cohort study, no mention of a registered protocol, eligibility criteria, or how missing data were treated. Many journals now check against these guidelines explicitly, so a missing item is a concrete, citable reason to return the manuscript.

A hypothetical case. (Clearly hypothetical.) A randomized trial reports outcomes for 84 patients but never accounts for how many were randomized, excluded, or lost to follow-up. A reviewer cannot reconstruct the denominator, and CONSORT requires that they be able to. The science may be sound; the report is not checkable.

Self-check. Identify your design's reporting guideline — CONSORT for trials, STROBE for observational studies, PRISMA for systematic reviews, and so on — and walk the checklist line by line before submission, not after revision. Most are freely available. A missing item is the easiest rejection reason to prevent, because the list is published in advance.

§06 Does the conclusion drift from the results (spin)?

What a reviewer sees. A results section that says one thing and a discussion or abstract that says something more flattering. A primary endpoint that was not met, with the narrative pivoting to a favorable secondary outcome as if it were the main finding. Hedged results that lose their hedges by the time they reach the abstract. This is “spin,” and experienced reviewers are trained to look for it.

A hypothetical case. (Clearly hypothetical.) A trial's primary endpoint — 30-day readmission — shows no significant difference. The abstract leads instead with a significant reduction in length of stay, a secondary endpoint, and concludes the intervention is “effective.” A reader of the abstract alone would believe the primary endpoint was met. It was not.

Self-check. Put your abstract's conclusion next to your primary result and confirm they say the same thing. If your primary endpoint was negative, the abstract must say so plainly. Promoting a secondary endpoint to headline status is the most reliable way to lose a reviewer's trust — and, increasingly, an editor's.

§07 Are missing data and multiplicity left unaddressed?

What a reviewer sees. Silence where a method should be. A cohort that starts at 400 and ends at 280 with no explanation of the missing 120 or how they were handled. A paper reporting twenty comparisons and celebrating the three that reached p < 0.05, with no adjustment and no acknowledgment that some “findings” are expected by chance alone. Both are familiar reasons reviewers distrust a result.

A hypothetical case. (Clearly hypothetical.) A study tests an intervention against 15 secondary outcomes, finds two with p < 0.05, and frames them as discoveries. With 15 independent tests at α = 0.05, roughly one significant result is expected by chance even if nothing real is there. Without correction or pre-specification, the two “findings” cannot be distinguished from noise.

Self-check. Account for every patient who entered and did not finish, and state how missing data were handled rather than dropping them silently. Pre-specify your primary endpoint and analyses, and when you run many comparisons, say so and adjust — or label the rest exploratory. Naming a limitation is far stronger than having a reviewer find it unnamed.

§08 How do you run the check before a reviewer does?

None of these patterns require misconduct. They are the ordinary failure modes of busy clinical research — and they are a preventable share of a problem that keeps growing: biomedical retractions have roughly quadrupled over twenty years ↗. The statistical and interpretive errors above are the kind that are far cheaper to catch before submission than after publication.

A structured pre-submission read is one way to catch them. RigorMD appraises a manuscript with two independent engines, recomputes its statistics deterministically, and positions its novelty against retrieved prior work, then returns a severity-scored report grounded in your own quotes — across design–claim fit, results–conclusion alignment, numerical consistency, reporting-guideline adherence, contribution and literature positioning, and the rest of the seven domains above. The report is guidance you can verify, not a guarantee of acceptance, and not a substitute for peer review or for a statistician's input on study design.

If you are weighing how to get that read — a departmental biostatistician, a commercial review service, an automated checker, or an automated assessment like ours — see your options for statistical review before submission. Or read a full sample report → to see exactly what a finding looks like, and review pricing — the pre-submission review is $30.

How to read this. These patterns are the questions a careful reviewer asks, not a guarantee. RigorMD flags methodological and statistical issues for your judgment; it is not a substitute for peer review or a statistician's input on study design. See also your options for statistical review and a sample report.