A Friendly Roadmap For Reading Medical Research Articles
Plain‑language strategies for anyone who wants to analyze research studies without getting a PhD in statistics. This is how I kick the tires of a medical journal article.
UPDATED: July 28th, 2025
Welcome to the Healthy Aging Newsletter, a free publication translating trustworthy medical research into simple habits to age well, free of chronic disease. I’m Dr. Ashori, a family medicine doctor turned health coach.
News stories usually highlight the flashiest part of a study, not the whole picture. Use this guide or download my PDF with step‑by‑step AI prompts to review a research paper yourself. Oh, and remember, you need the full paper, not the abstract.
1. Leading or Biased Title
A fair title describes what the researchers actually did. Example: “Walking improves mood in adults with depression” is clearer than “Revolutionary exercise cure discovered.” The emphasis here is that the findings only apply to a subgroup of people, those with depression.
2. Clinical Endpoints vs. Surrogate Markers
Living longer or avoiding a stroke matters more than seeing a blood test improve. Cholesterol can drop yet the person might not feel better, live better, longer, or avoid a heart attack or stroke (real clinical endpoint).
We can improve someone’s blood pressure (surrogate marker) by keeping them perpetually dehydrated, but that won’t improve their risk of heart disease as beautifully explained by
and the team.3. Do The Results Apply to You?
A trial done with only 30-year‑old athlete participants doesn’t apply to more sedentary 30-year-olds. The participants chosen in the study and how they were selected must be applicable. Oncology studies, for example, may recruit the healthiest participants, skewing towards more favorable results.
4. Trial Preregistration
Preregistration done publicly locks in the goals before data collection begins, preventing authors from changing the goal post when they realize the results won’t be favorable. It stops researchers from quietly switching the main question after they peek at the data.
5. Data Cherry Picking
Check if some outcomes mentioned in the methods vanish in the results. Disappearing data might be masking major problems or a reflection of sloppy research protocols.
Example: When paroxetine was promoted for treating depression in teenagers, the published study claimed it was safe and effective. The original study protocol in preregistration, however, had different primary outcomes (#4, above), and serious side effects such as suicidal thoughts were disguised under the vague label “emotional lability.”
6. Control Group Selection
A study of a new pain pill should compare it with the current standard pill. Sometimes comparing it to even a placebo may not be appropriate if the established care is a well-known pain medication.
7. Internal and External Validity
Internal validity asks, “Was the study run cleanly?” External validity asks, “Will these findings hold up in other clinics or homes?” Subgroup analysis looks at whether, for example, women and men responded the same way.
Example: The well-known SPRINT trial had a rock solid internal validity (large randomized groups, careful blinding, clear primary endpoint) but limited external validity because it excluded patients with diabetes, prior stroke, or symptomatic heart failure. Further reading.
8. High Participant Drop-out?
If 1/3 of research participants quit the study early, the final data and conclusions may be hard to trust.
Example: In the WHI study, by the time the trial was halted early at 5 years, 42% of the women in the active group and 38% in the placebo group had already stopped taking their assigned pills. Further reading.
9. Absolute Risk, Relative Risk & NNT
If a drug lowers heart attack risk from 2% to 1% the absolute drop is 1%, referred to as absolute risk reduction. This could also be reported in the paper or in public media as a 50% decrease in risk, called relative risk reduction. Either way, 100 people must take the drug for 1 person to benefit (the number needed to treat, NNT). Decide if that feels worthwhile.
Example: If you’re a woman in your 50s and you decide to cut out alcohol to reduce your risk of breast cancer, you’d have a relative risk reduction of 10%. But when looking at it based on your overall (absolute) risk, assuming your baseline risk as a non-drinker is 2.4%, it goes up to 2.64% if you chose to have a daily drink. Or, based on the NNT, 417 women would have to give up alcohol for the next 10 years to prevent 1 additional case of alcohol-related breast cancer. Further reading.
10. MCID
Minimal Clinically Important Difference (MCID) is the smallest change on a scale that patients can actually feel in daily life. It answers the question, “Was the improvement big enough for a real person to notice and care?”
Example: A study on chondroitin for osteoarthritis might show a pain improvement score of 4 points, but in other validated studies we know that anything under a 10 point difference is meaningless. Studies should be mentioning this clearly. Further reading.
11. Now Power, No Evidence
If a study signs up only a handful of people, it can easily overlook a real benefit or harm. Too many participants and you risk spotting tiny, meaningless differences. That’s why researchers run a power calculation first, aiming for about 80% power, based on how common the outcome is to then select how many participants they need to enroll in the study.
Example: A study with only 36 participants isn’t adequately powered to show any meaningful signal when the prevalence of the disease that’s being studied requires a much large participant size. Further reading.
12. Sensitivity Analyses
To sanity check the data, the authors might repeat the test leaving out one hospital or using a different statistical model to see if the result stays about the same.
Example: During an audit of the PREDIMED Mediterranean Diet trial, the authors realized that 1,588 participants weren’t truly randomized. So they ran a sensitivity analysis that removed those 1,588 people and then repeated every calculation, showing no major change from the original conclusion. Further reading.
13. Defining Outcomes Ahead of Time
The protocol or trial registry (#4) should spell out one primary outcome (sometimes two in rare cases) and a short, ranked list of secondary outcomes. Anything added later inflates the chance of a lucky positive answer. Good papers correct for this.
Example: The journal forced the authors of the ISIS-2 trial to slice the data into lots of subgroups. The team jokingly and to game this idea reported results by astrological sign, showing that aspirin ‘worked’ for all signs except Gemini and Libra.
14. Inclusion & Exclusion Rules
Leaving out anyone with diabetes might make a blood pressure drug look safer than it really is in everyday life. The process of excluding and even including certain participants should follow the logic of the trial design.
Example: Instead of selecting the ideal patients for the best possible trial outcomes (unethical), the authors of ALLHAT antihypertensive trial opened its doors to more than 40,000 patients straight from primary care clinics, without cherry picking.
15. Trial Stopped Early
Ending a study after the first sign of benefit can exaggerate how well the treatment works over the long term. There are good reasons to stop a study early, such as due to harm, clear benefit, or futility.
16. Making Biological Sense
Surprise results are possible but they deserve extra scrutiny and later confirmation. If we know, for example, that the best lipid lowering meds will at best decrease cardiovascular risk by 20% and along comes pomegranate juice, claiming a 30% drop … possible but quite unlikely.
When in doubt, don’t dismiss the study but look for other supporting studies and try to understand the supposed mechanism (i.e., polyphenols?)
17. Author Biases?
Funding alone doesn’t negate a study yet hidden ties can bias the write‑up. Look for a disclosure statement. And I consider it a major red flag if ghostwriters were used or
In the late 1990s Merck paid for the VIFOR Vioxx trial, hired a medical writing firm to draft the manuscript, and listed several respected academics as authors. Unfortunately, the drug may have caused major cardiovascular risk in the enrolled participants, of which the authors knew nothing until later.
18. Clear Transparency
Good papers share data sets or at least offer them on request and include a CONSORT or STROBE checklist of what was reported. A paper is easier to trust when you can trace every step from plan to raw data. When an important study is underway, other researchers can follow along with the published data, run their own analyses, and vet the information.
19. Strength of the Publishing Journal
A journal’s name is not a guarantee of quality, but it does tell you how tough the peer‑review process probably was and whether red‑flag papers get cleaned up, either retracted or republished with stricter guidelines.
Example: In May 2020 NEJM published an observational study claiming common blood‑pressure drugs were perfectly safe in COVID‑19 patients. Within days outside data sleuths noticed odd country counts and missing ethics approvals. When the private company (Surgisphere) behind the database refused to open its files, NEJM retracted the paper and posted an apology for the editorial lapse.
20. Advertisement vs. Educational
Watch for language like “breakthrough” or “game changer” that aims to persuade your opinion rather than offer objective information. A study never has to “sell” you anything, it should only add to the collective knowledge.
Example: The REDUCE‑IT investigators wrote a solid paper in NEJM showing prescription omega‑3 cut cardiovascular events. On the same day the sponsor’s press release called the drug a breakthrough and paradigm shift and that it lowered risk by 25%. The absolute risk reduction (#9) was 4.8 % and the mineral‑oil placebo raised eyebrows about fairness.
21. How Would You Design the Study?
Sometimes practical limits force a compromise. Still, ask if another design would answer the question more clearly. Specifically, imagine you’re a researcher and you got the answer presented to you, how would you design a study to best answer this question?
Example: ACCORD pushed A1c below 6% in people with long‑standing type 2 diabetes and high heart risk, using heavy drug combos and rapid dose jumps. The intensive group ran into more deaths, so the study stopped early. A cleaner setup would have aimed for a gradual A1c drop, left out patients with advanced heart disease, and added a run‑in period to stabilize meds before randomization.
22. Honest Discussion & Conclusion Section
Balanced papers admit weaknesses and suggest future research instead of overstating certainty. They certainly shouldn’t make unjustified claims based on what the data didn’t show.
23. Missing Data & Lost-To-Follow-Up
Replacing gaps with reasonable estimates is preferred over ignoring and omitting them. If the healthiest folks drop out of the control arm, the treatment can look worse than it is. If the sickest drop from the treatment arm, the pill can look like a miracle.
Example: The CATIE trial compared antipsychotics in adults with schizophrenia. By 18 months more than 70% of patients had stopped their assigned drug or left the study. The investigators reported intention‑to‑treat results and ran several sensitivity analyses (#12), but they still warned that the heavy dropout made firm conclusions tough. That honesty helps readers be aware of the limitations of this study. Further reading.
Free PDF Download
I hope you found this article useful. Download this free PDF and upload it to your favorite LLM AI along with the research paper you want analyzed and this should get you started.
This is such a great resource. The better we understand how to interpret the quality of a study, the more confident we can be in whether to take note of the results.