Grading the quality of evidence and the strength of recommendations

Judgments about evidence and recommendations in healthcare are complex. For example, those making recommendations have to decide between recommending selective serotonin reuptake inhibitors (SSRI’s) and tricyclics for the treatment of moderate depression, which outcomes to consider, which evidence to include for each outcome, how to assess the quality of that evidence, and how to determine if SSRI’s do more good than harm compared with tricyclics. Because resources are always limited and money that is allocated to treating depression cannot be spent on other worthwhile interventions, they may also need to decide whether any incremental health benefits are worth the additional costs.

Systematic reviews of the effects of healthcare provide essential, but not sufficient information for making well informed decisions. Reviewers and people who use reviews draw conclusions about the quality of the evidence, either implicitly or explicitly. Such judgments guide subsequent decisions. For example, clinical actions are likely to differ depending on whether one concludes that the evidence that warfarin reduces the risk of stroke in patients with atrial fibrillation is convincing (high quality) or that it is unconvincing (low quality).

Similarly, practice guidelines and people who use them draw conclusions about the strength of recommendations, either implicitly or explicitly. Using the same example, a guideline that recommends that patients with atrial fibrillation should be treated may suggest that all patients definitely should be treated or that patients should probably be treated, implying that treatment may not be warranted in all patients.

A systematic and explicit approach to making judgments such as these can help to prevent errors, facilitate critical appraisal of these judgments, and can help to improve communication of this information. Since the 1970’s a growing number of organizations have employed various systems to grade the quality (level) of evidence and the strength of recommendations. Unfortunately, different organizations use different systems to grade evidence and recommendations. The same evidence and recommendation could be graded as “II-2, B”, “C+, 1”, or “strong evidence, strongly recommended” depending on which system is used. This is confusing and impedes effective communication.