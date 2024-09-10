Note: This essay was prepared with the research assistance and ghostwriting of ChatGPT 4.0. No LLMAI were harmed in the process, although I felt inclined to threaten them from time to time.

Author's Preface

A few years back I prepared a bibliography of papers on random controlled trials (RCT) and the evidence hierarchy, which is pretty extensive. I have 128 references I think. I read some in detail and skimmed a lot more. I used Zotero for record keeping, free software, which is excellent for recording bibliographic information.

I have not stopped thinking about that issue. I encounter the claim from time to time that the only research standard worth worrying about is the gold standard of RCTs. I've always thought that's probably not correct according to what I've read, articles written by very respectable scholars. In my view, that opinion on RCTs is all nonsense. There are all kinds of exceptions.

I did study research methods and statistics in university. That was a long time ago, the 1970s. I learned methods of research of experimental psychology: statistics and research designs. I went to graduate school.

I don't remember the term random controlled trials being bandied about but we certainly learned the principles. Maybe the term was just one I have forgotten. We certainly learned the principles.

In any case as part of my “Understanding Scholarship” series, I thought I'd address random controlled trials and the evidence hierarchy. I also felt the need to rant about the claim by skeptics: “data are not the plural of anecdote”. I know that's an inversion of the original claim which is “data are the plural of anecdote”.

Introduction

The evidence hierarchy is a key framework in research, ranking various types of evidence based on their reliability and ability to demonstrate causality. It has been most widely used in fields such as medicine, psychology, and public health. However, the rigid application of the hierarchy is increasingly questioned in various disciplines, as its relevance may vary based on the nature of the field being studied. This essay explores the structure of the evidence hierarchy, its applicability, and its limitations across different academic domains. Additionally, it will cover the methodology behind research studies, including the purpose of determining causality, the handling of confounding factors, and the limitations posed by small effect sizes and large variability. The discussion will also highlight how the phrase "data are the plural of anecdote," has been distorted by pseudo-skeptics, reversing the original meaning.

What is the Evidence Hierarchy?

Overview of the Evidence Hierarchy (with Criticisms Considered)

The evidence hierarchy is commonly used to rank research methods based on their perceived reliability and ability to establish causality. However, it has been criticized for being overly rigid and for not adequately reflecting the complexity of real-world research contexts. Below is a more nuanced description of each level, incorporating these criticisms.

1. Systematic Reviews & Meta-Analyses

Description: These synthesize data from multiple studies, aiming to provide comprehensive conclusions. However, they are dependent on the quality and consistency of the underlying studies, which may vary. Systematic reviews can also be prone to publication bias, where only positive results are published.

Example: A review combining data from several drug trials to assess overall effectiveness.

2. Randomized Controlled Trials (RCTs)

Description: RCTs are valued for their ability to reduce bias by randomly assigning participants to groups. However, they can be expensive, time-consuming, and not always ethically feasible. Additionally, real-world complexity can limit their external validity, meaning the findings may not apply to broader populations.

Example: A drug trial with participants randomly assigned to receive either the treatment or a placebo.

3. Cohort Studies

Description: These observational studies track groups over time, offering insights into long-term effects. They are useful when RCTs are not feasible but are more vulnerable to confounding variables, which can distort results.

Example: A study following smokers and non-smokers over 20 years to observe lung health outcomes.

4. Case-Control Studies

Description: These studies compare individuals with and without a condition to identify potential causes. They are useful for studying rare diseases but are limited by their retrospective nature and the risk of recall bias.

Example: A study comparing cancer patients to non-patients and analyzing their history of exposure to certain chemicals.

5. Cross-Sectional Studies

Description: These offer a snapshot of a population at a specific time, useful for identifying associations but not for establishing causality. They are efficient but limited in their ability to account for the complexities of time or change.

Example: A survey measuring the prevalence of diabetes in a population.

6. Case Reports

Description: Case reports provide detailed descriptions of individual or small group cases, often highlighting rare or novel phenomena. They are useful for generating hypotheses but are not generalizable and lack rigorous controls.

Example: A physician’s report on a rare drug side effect observed in one patient.

7. Expert Opinions

Description: Expert opinions are based on professionals’ experience and knowledge. While valuable in contexts lacking empirical data, they are highly subjective and vary based on the expert’s background and biases.

Example: A clinician’s guidance on best practices for treating a rare disease.

8. Anecdotal Evidence

Description: Anecdotal evidence is based on individual experiences or personal accounts. While it can provide initial insights, it is highly subjective and unverified, making it the least reliable form of evidence.

Example: A patient’s account of how a specific diet improved their symptoms.

---

Adjustments Based on Criticisms

Each of these levels offers varying degrees of reliability, but the usefulness of the evidence hierarchy depends on the context in which it is applied. It is important to remain critical of its limitations and to integrate other methods when necessary, recognizing that no single approach can provide all the answers in complex real-world scenarios.

The evidence hierarchy categorizes research methods from most to least reliable, with the aim of offering a clear ranking of study designs based on their capacity to establish causality and minimize bias. At the top of the hierarchy, systematic reviews and meta-analyses combine data from multiple studies to provide a high level of reliability, while randomized controlled trials (RCTs) rank just below, being the most trusted individual study design. Lower on the hierarchy are cohort studies, case-control studies, cross-sectional studies, and case reports. At the bottom lie expert opinions and anecdotal evidence, which provide weaker forms of evidence that lack empirical rigor (Freemantle et al., 2003; Greenhalgh, 2014; Sackett et al., 1996).

Applicability or Inapplicability of the Evidence Hierarchy Across Scholarly Disciplines

The evidence hierarchy is particularly useful in fields where experimental control is feasible, such as medicine and public health (Guyatt et al., 1995; Sackett et al., 1996). However, its strict application is less relevant in disciplines like archaeology, history, and education, where observational methods or qualitative data are often the primary sources of evidence (Cartwright & Deaton, 2018). Critics argue that applying a rigid hierarchy can lead to the undervaluation of valuable research, particularly in areas that rely on narrative-based methods or contextual understanding (Greenhalgh, 2014).

The Evidence Hierarchy Levels: Pros and Cons

1. Systematic Reviews & Meta-Analyses

Pros:

Aggregates multiple studies, minimizing biases (Guyatt et al., 1995). Offers broad generalizability across populations (Freemantle et al., 2003). Can resolve conflicting findings from individual studies (Greenhalgh, 2014). Efficiently synthesizes a large body of research (Allen et al., 2015). Highlights gaps in existing research (Ioannidis, 2005).

Cons:

Dependent on the quality of the included studies (Ioannidis, 2005). Prone to publication bias (Sackett et al., 1996). Mixing studies with different methodologies can produce misleading results (Frieden, 2017). Conducting systematic reviews can be resource-intensive (Cartwright & Deaton, 2018). In some fields, a lack of high-quality studies makes meta-analysis difficult (Freemantle et al., 2003).

2. Randomized Controlled Trials (RCTs)

Pros:

Strong evidence for causality through randomization (Guyatt et al., 1995). Minimizes confounding variables (Freemantle et al., 2003). Provides reproducible results (Akobeng, 2005). Blinding reduces the risk of bias (Sackett et al., 1996). Allows precise control over experimental conditions (Greenhalgh, 2014).

Cons:

May lack external validity due to artificial settings (Bhatt & Mehta, 2016). Expensive and time-consuming to conduct (Bondemark & Ruf, 2015). Ethical challenges in withholding treatment (Cartwright & Deaton, 2018). Industry sponsorship can introduce bias (Ioannidis, 2005). Limited in representing diverse populations (Sørensen, 2006).

3. Cohort Studies

Pros:

Captures long-term outcomes (Euser et al., 2009). Reflects real-world conditions better than RCTs (Cook & Thigpen, 2019). Can study multiple outcomes from a single exposure (Frieden, 2017). No ethical concerns over withholding treatment (Allen et al., 2015). Suitable for studying rare exposures (Bhatt & Mehta, 2016).

Cons:

Susceptible to confounding factors (Freemantle et al., 2003). Loss to follow-up can skew results (Euser et al., 2009). Expensive due to long timeframes (Guyatt et al., 1995). Selection bias may affect representativeness (Cook & Thigpen, 2019). Misclassification errors may occur (Ioannidis, 2005).

4. Case-Control Studies

Pros:

Efficient for studying rare diseases (Aronson & Hauben, 2006). Less resource-intensive than cohort studies (Akobeng, 2005). Quick to conduct retrospectively (Freemantle et al., 2003). Useful for identifying risk factors (Euser et al., 2009). No need for randomization (Horn & DeJong, 2005).

Cons:

Recall bias affects accuracy (Ioannidis, 2005). Cannot establish causality (Cleophas & Zwinderman, 2000). Prone to selection bias (Freemantle et al., 2003). Difficulties in matching controls (Guyatt et al., 1995). Confounding factors are hard to control (Horn & DeJong, 2005).

5. Cross-Sectional Studies

Pros:

Quick and cost-effective (Freemantle et al., 2003). Useful for estimating prevalence (Guyatt et al., 1995). Generates hypotheses for further research (Aronson & Hauben, 2006). Large sample sizes increase statistical power (Allen et al., 2015). Provides a snapshot of a population at a given time (Greenhalgh, 2014).

Cons:

Cannot determine causality (Sackett et al., 1996). Susceptible to recall bias (Frieden, 2017). Only captures data at one point in time (Guyatt et al., 2011). Misleading correlations may arise (Ioannidis, 2005). Confounding variables are not easily controlled (Sackett et al., 1996).

The Purpose of Research Studies: Determining Causality

The ultimate goal of any well-designed research study is to determine causality—that is, to understand whether an intervention or exposure leads to an outcome (Sackett et al., 1996). This is why methodologies like RCTs, with their control over variables, are favored. However, observational studies like cohort and case-control designs also contribute by offering insights into real-world effects and long-term outcomes (Cartwright & Deaton, 2018).

The Role of Confounding Factors and Controlling for Their Effects

Controlling for confounding factors is critical in establishing valid research findings. Confounding occurs when an extraneous variable affects both the independent and dependent variables, skewing the results (Ioannidis, 2005). RCTs, with their random assignment, are designed to minimize the effects of confounders, while observational studies must rely on statistical adjustments (Freemantle et al., 2003).

The Use of Inferential Statistics to Generalize from a Sample to a Population

Research studies sample populations and use inferential statistics to make predictions about larger populations. However, generalizing results requires careful attention to variability and sample representativeness (Guyatt et al., 1995). In medical trials, for example, statistical significance helps to determine whether findings are likely to hold across broader groups, even though individual responses may vary greatly (Greenhalgh, 2014).

The Problem with Small Effect Sizes and Large Variability

One significant challenge in research is dealing with small effect sizes and large variability. Small effects can be difficult to detect, especially when variability is high among participants. High variability dilutes the signal of an intervention, making it harder to distinguish the true impact from noise (Ioannidis, 2005). This problem is particularly acute in fields like medicine and psychology, where human variability can obscure findings, even in well-designed RCTs (Guyatt et al., 1995; Bhatt & Mehta, 2016).

How Averages Tell Us Little About Individuals

While research studies provide average outcomes for a population, these averages may tell us very little about individual responses. Each person has unique biological, environmental, and psychological factors that influence how they respond to treatments or exposures (Greenhalgh, 2014). As a result, treatments that appear effective on average may be less effective or even harmful for specific individuals (Mulder et al., 2018). This highlights the limitations of population-level generalizations, especially in fields like medicine and education, where individual variability plays a large role (Braun, 2019).

The Distortion of "data is the plural of anecdote"

The phrase "data is the plural of anecdote" has been distorted by pseudo-skeptics to the phrase “data is not the plural of anecdote” in order to dismiss anecdotal evidence entirely. Originally, the phrase was used to remind researchers that anecdotes can sometimes point to meaningful trends and that collected data often emerge from individual anecdotes (Aronson & Hauben, 2006). Pseudo-skeptics, however, have flipped this meaning, using the phrase to imply that individual experiences are always irrelevant in scientific discourse (Cartwright & Deaton, 2018).

The Original Phrase: "Data is the Plural of Anecdote"

Contrary to its modern distortion, the original phrase was intended to highlight that data are, in fact, a collection of anecdotes, at least metaphorically. Each data point represents an individual instance, which, when aggregated, contributes to a broader understanding (Aronson & Hauben, 2006). The notion that anecdotal experiences can lead to valuable insights, especially when properly documented, remains relevant in areas like clinical practice and public health (Frieden, 2017).

Summary

The evidence hierarchy remains a useful framework for evaluating research, particularly in fields that rely on experimental control, such as medicine and public health. However, its rigid application can be problematic in disciplines that emphasize qualitative data or contextual factors. Research methods like RCTs are valuable for determining causality, but they are not without limitations, particularly when dealing with small effect sizes, high variability, and the complexities of real-world populations. Additionally, the distortion of phrases like "data is not the plural of anecdote" demonstrates how the misuse of concepts can undermine meaningful discourse. Ultimately, a balanced approach to evidence that recognizes both the strengths and limitations of each level in the hierarchy is essential for advancing knowledge across diverse fields.

