Methods

How RigorMD reads a manuscript

A structured, severity-scored read of the methods and statistics — anchored in GRADE certainty, reporting standards, and the bias frameworks (RoB2 / ROBINS-I) journals actually use. It flags concerns for you to weigh; it does not certify a paper.

§01 Six methodological domains

Every manuscript is appraised across six domains and severity-scored (mild, moderate, serious, critical) in each: design and claim fit, the alignment of results with the stated conclusion, statistical appropriateness, reporting-guideline adherence (the EQUATOR family — STROBE, CONSORT, PRISMA, and peers), numerical and statistical consistency, and clinical interpretability. The overall grade is weighted by how central a problem is to the headline claim, so a serious flaw confined to a peripheral analysis does not by itself sink the paper.

§02 GRADE certainty and the calibration headline

For the paper’s primary conclusion, RigorMD reads two things separately: how confidently the authors state the claim (their language), and how much certainty the evidence actually warrants on the GRADE scale (very low, low, moderate, high) — downgraded for risk of bias, inconsistency, indirectness, imprecision, and publication bias by the method, not by a vote.

The headline you see is the gap between those two, computed deterministically — not judged by the model. A humble claim backed by limited evidence reads as supported; an over-stated claim on the same evidence reads as not supported or overly confident; a claim that runs against its own results reads as counter to the results. The same evidence can earn a different headline depending only on how strongly the authors phrased the conclusion — which is the point.

§03 Two voices: clinician and statistician

Each finding is written twice. The clinician spine is one or two plain sentences — what the study can and cannot support, and the clinical “so what” — with no jargon or effect-size arithmetic. Folded beneath it, a “For your statistician” panel carries the technical companion: the bias mechanism, its named RoB2 / ROBINS-I domain, the GRADE rationale, and the concrete remedy. Read the spine to decide; open the panel to defend it to a reviewer.

§04 How a finding earns its place

Two independent LLM engines appraise the manuscript blind, and the appraisal is repeated across several passes; only findings that recur across passes are graded, so a one-off observation does not become a verdict. A deterministic layer recomputes statistics where the reported numbers allow — a flag there is a calculation you can check — and every quote is verified against your own text before it is shown. Serious and critical findings are then adversarially verified: the engine tries to refute its own finding, and anything it cannot defend is withdrawn or right-sized before the report reaches you.

§05 Scope

Scope. RigorMD flags potential methodological and statistical concerns for the authors to review. It does not certify correctness, validity, or fitness for publication, and is not a substitute for peer review or a qualified biostatistician. See a worked example on the sample report.