Statistical Power
Updated:
Overview
This analysis will perform both independent two-sample t-tests and a paired-sample t-test on R’s sleep
dataset. Each test’s confidence interval, p-value, and statistical power will be used to assess the test’s quality and whether or not we can safely reject the null hypothesis.
Imports
require(pwr)
Data
The data used in this notebook is from R’s built-in sleep
dataset. The data shows the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients. There are 3 variable fields:
extra
the amount of extra sleep a patient gotgroup
which drug they were givenID
the patient ID
Additional information can be found:
https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/sleep.html
Relevant files:
Gust_INET4061Lab3_R.Rmd
and Gust_INET4061Lab3_R.html
sleep
(sleep_wide <- data.frame(
ID=1:10,
group1=sleep$extra[1:10],
group2=sleep$extra[11:20]
))
# Pooled Standard Deviation
SDpooled = sqrt((sd(sleep_wide$group1)**2 + sd(sleep_wide$group2)**2)/2)
# Effect Size (Cohen's d)
d = (mean(sleep_wide$group1) - mean(sleep_wide$group2))/SDpooled
Exploratory Data Analysis
The dataset used is one built-in with R. As provided, it shows two groups concatenated together, thus creating duplicate ID fields. Traditionally, an ID field does not contain repeated values if it can be avoided.
In this instance, the IDs represent the same individual, so sleep_wide
was created by splitting by group and reissuing the ID field. Depending on the function call, one varient may be less verbose than the other and as such, both are used.
Two Sample t-tests
# Welch t-test
t.test(extra ~ group, sleep) # implicitly assumed: alternative="two.sided", var.equal=FALSE
##
## Welch Two Sample t-test
##
## data: extra by group
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.3654832 0.2054832
## sample estimates:
## mean in group 1 mean in group 2
## 0.75 2.33
# Using the widened version produces the same result
# t.test(sleep_wide$group1, sleep_wide$group2)
t.test(extra ~ group, sleep, var.equal=TRUE)
##
## Two Sample t-test
##
## data: extra by group
## t = -1.8608, df = 18, p-value = 0.07919
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.363874 0.203874
## sample estimates:
## mean in group 1 mean in group 2
## 0.75 2.33
pwr.t.test(n=10, d=d, type="two.sample")
##
## Two-sample t test power calculation
##
## n = 10
## d = 0.8321811
## sig.level = 0.05
## power = 0.4214399
## alternative = two.sided
##
## NOTE: n is number in *each* group
In both the case of Welch’s unequal variance t-test and Student’s t-test we fail to reject the null hypothesis
For both tests, the p-value was larger than our designated significance level of p=0.05.
Welch’s: 0.07939, Student’s: 0.07919
Welch’s confidence interval: -3.3654832 0.2054832
Student’s confidence interval: -3.363874 0.203874
From the 95th confidence interval of these tests, we can only say we are 95% confident that the difference between means is between approximately -3.36 and 0.20. Since the interval contains 0, there is not sufficient evidence to claim a difference.
The statistical power is ~0.421, with a power this low, the main conclusion to be drawn is either that our sample n=10 is too small or our testing method is flawed. Since we are making a direct comparison between two outcomes of the same individual and not using a paired t-test, the latter reasoning makes sense.
Paired t-tests
# Sort by group then ID
sleep <- sleep[order(sleep$group, sleep$ID), ]
# Paired t-test
t.test(extra ~ group, sleep, paired=TRUE)
##
## Paired t-test
##
## data: extra by group
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.4598858 -0.7001142
## sample estimates:
## mean of the differences
## -1.58
# Resulting values are equivalent to a paired t-test
t.test(sleep_wide$group1 - sleep_wide$group2, mu=0, var.equal = TRUE)
##
## One Sample t-test
##
## data: sleep_wide$group1 - sleep_wide$group2
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -2.4598858 -0.7001142
## sample estimates:
## mean of x
## -1.58
pwr.t.test(n=10, d=d, type="paired", alternative="two.sided")
##
## Paired t test power calculation
##
## n = 10
## d = 0.8321811
## sig.level = 0.05
## power = 0.6500366
## alternative = two.sided
##
## NOTE: n is number of *pairs*
pwr.t.test(n=10, d=d, type="paired", alternative="less")
##
## Paired t test power calculation
##
## n = 10
## d = -0.8321811
## sig.level = 0.05
## power = 0.7828239
## alternative = less
##
## NOTE: n is number of *pairs*
With a paired t-test, we would now be able to reject the null hypothesis
the p-value is now 2.833e-3, well below the designated significance level of p=0.05.
Confidence interval: -2.4598858 -0.7001142
From the 95th confidence interval of this, we can say we are 95% confident that the true difference between means is approximately between -2.46 and -0.70. The interval does not contain 0, so rejecting the null hypothesis is no longer illogical.
The statistical power is ~0.650 with a two-sided alternative, an increase from the independent tests, but still lower than the desired 0.80. However, when the alternative is less, the statistical power increases to ~0.78, meaning if our null hypothesis was that the mean of group one is not less than the second group, we have much greater statistical power. To reliably increase power further a larger sample size may be used.
t.test(sleep$extra, mu=0)
##
## One Sample t-test
##
## data: sleep$extra
## t = 3.413, df = 19, p-value = 0.002918
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.5955845 2.4844155
## sample estimates:
## mean of x
## 1.54
pwr.t.test(n=10, d=d, type="one.sample")
##
## One-sample t test power calculation
##
## n = 10
## d = 0.8321811
## sig.level = 0.05
## power = 0.6500366
## alternative = two.sided
Conclusions
This document conducted both independent two-sample t-tests and a paired t-test on R’s sleep
dataset. From the analysis of p-values, confidence intervals, and power levels we were able to demonstrate how using independent tests on dependent data leads to flawed results.
Furthermore, we were able to conclude with a relevantly high level of confidence that there is a statistically significant difference between group1 and group2 but to be more certain we would need a larger sample size.
Future works may involve the use of a dataset with a larger sample size or expanding the analysis with ANOVA tests and their respective power tests.
Tags: power, r, sleep, statistics, ttests
Categories: r, statistics