How to use GPower
Why is this tutorial located at www.mormonsandscience.com? A few years ago I was looking for a place to store my instructions on G*Power so I could easily access them at work. Rather then create a new website, I put them on this my preexisting site. Soon people began locating and using the guide. I've added an introduction, a disclaimer, and instructional videos on using GPower.

Introduction
Hey, McFly! Those boards don’t work on water! Unless you’ve got power! This statement by Griff's buddies on Back to the Future II is applicable to the research domain. It goes something like this: “Hey McFly! That research won’t work on wishful thinking, unless you’ve got statistical power!”
Statistical power refers to the probability of correctly rejecting the null hypothesis of no effect. It is important to know statistical power before launching a research project. Low statistical power may lead one to conclude that there is no effect from a treatment when there is (called a Type II error), while an “overpowered” study may lead one to conclude that a significant effect has practical or clinical significance when it does not.
Concern over statistical power is a relatively recent phenomenon. Research has shown that a few decades ago studies in the social and health sciences were underpowered. Many had only a 20-30% chance of correctly rejecting the null hypotheses. Underpowered studies may have led investigators to incorrectly conclude that there were no effects from manipulations of their predictor/independent variables. Nowadays many institutional review boards (IRBs) and granting agencies require power and sample size calculations. But where can researchers turn for help?
There are a number of commercial power and sample size programs available. PASS and SPSS Power are a few examples. These programs are very good and will cost you about $1000. There are also several freeware power and sample size calculators available online. Unfortunately many free programs are limited in the number of available power calculations.
I have used several power and sample size programs. My favorite is G*Power. G*Power was created by faculty at the Institute for Experimental Psychology in Dusseldorf, Germany. It offers a wide variety of calculations along with graphics and protocol statement outputs. Best of all, it is free! The developers released version 3.1.9 in 2014. Terms of use and a downloadable zip file are available here.
After downloading the program you may ask yourself, how do I use it? There are limited resources. The developers have a tutorial on using G*Power, but it is sparse in some places and may be difficult for some people to follow.
I created an easy-to-follow guide for using GPower 3.x. The guide is included below. It is a work in progress and I will update it and add more analyses as time permits. Several of the G*Power examples on this page have been checked against power calculations in SPSS, NQuery, and PASS with good results.
Disclaimer: I cannot guarantee the completeness and correctness of this material. Users assume all risks associated with using the guide. If you have any comments or suggestions on improving the guide, let me know.
Hey, McFly! Those boards don’t work on water! Unless you’ve got power! This statement by Griff's buddies on Back to the Future II is applicable to the research domain. It goes something like this: “Hey McFly! That research won’t work on wishful thinking, unless you’ve got statistical power!”
Statistical power refers to the probability of correctly rejecting the null hypothesis of no effect. It is important to know statistical power before launching a research project. Low statistical power may lead one to conclude that there is no effect from a treatment when there is (called a Type II error), while an “overpowered” study may lead one to conclude that a significant effect has practical or clinical significance when it does not.
Concern over statistical power is a relatively recent phenomenon. Research has shown that a few decades ago studies in the social and health sciences were underpowered. Many had only a 20-30% chance of correctly rejecting the null hypotheses. Underpowered studies may have led investigators to incorrectly conclude that there were no effects from manipulations of their predictor/independent variables. Nowadays many institutional review boards (IRBs) and granting agencies require power and sample size calculations. But where can researchers turn for help?
There are a number of commercial power and sample size programs available. PASS and SPSS Power are a few examples. These programs are very good and will cost you about $1000. There are also several freeware power and sample size calculators available online. Unfortunately many free programs are limited in the number of available power calculations.
I have used several power and sample size programs. My favorite is G*Power. G*Power was created by faculty at the Institute for Experimental Psychology in Dusseldorf, Germany. It offers a wide variety of calculations along with graphics and protocol statement outputs. Best of all, it is free! The developers released version 3.1.9 in 2014. Terms of use and a downloadable zip file are available here.
After downloading the program you may ask yourself, how do I use it? There are limited resources. The developers have a tutorial on using G*Power, but it is sparse in some places and may be difficult for some people to follow.
I created an easy-to-follow guide for using GPower 3.x. The guide is included below. It is a work in progress and I will update it and add more analyses as time permits. Several of the G*Power examples on this page have been checked against power calculations in SPSS, NQuery, and PASS with good results.
Disclaimer: I cannot guarantee the completeness and correctness of this material. Users assume all risks associated with using the guide. If you have any comments or suggestions on improving the guide, let me know.
GPower is the Queen of Free Power and Sample Size Software
Table of Contents
Exact Tests
1. Correlation: Bivariate normal model (Pearson r for two continuous variables)
2. Linear Multiple Regression: Random Model
3. Proportion: Difference from Constant (one-sample, binomial test)
4. Proportions: Inequality, 2 Dependent Groups (McNemar's test)
5. Proportions: Inequality, 2 Independent Groups (Fisher’s Exact test)
8. Proportion: sign test (Binomial test)
T-tests
10. Correlation: Point Biserial Model (one continuous and one dichotomous variable)
15. Means: Difference between 2 dependent means (matched/paired samples t-test)
16. Means: Difference between 2 independent means (between/independent samples t-test)
17. Means: Difference from constant (one sample t-test)
20. Means: Wilcoxon-Mann-Whitney test (Wilcoxon Rank-Sum or MWU test)
Chi-Square Tests
22. Goodness of Fit tests: Contingency Tables
Z-Tests
28. Correlation: 2 Independent Pearson r’s
29A. Logistic Regression (binary logistic regression with a continuous predictor)
29B. Logistic Regression (binary logistic regression with a dichotomous predictor)
30A. Poisson Regression (with a continuous predictor)
30B. Poisson Regression (with a dichotomous predictor)
31. Proportions: Difference between 2 independent proportions
F-Tests
34. ANOVA: Fixed Effects, omnibus, one-way (1-way ANOVA test of a between/fixed effects variable)
35. ANOVA: Fixed Effects, special main effects and interactions (2-way ANOVA)
36. ANOVA: Repeated measures, between factors
37. ANOVA: Repeated measures, within factors
38. ANOVA: Repeated measures, within-between variables interaction
39. Hotelling’s T-squared : One group mean vector (within group Hotelling’s T-squared test)
40. Hotelling’s T-squared: Two group mean vectors (between groups Hotelling’s T-squared test)
41. MANOVA: Global Effects
42. MANOVA: Special Effects and Interactions
46. Linear Multiple Regression: Fixed Model, R2 deviation from zero
RPower - Advanced power and sample size calculations that can be done in the R statistical platform.
50A. Stepped Wedge using SWSamp (for means)
50B. Stepped Wedge using manual approach (for means)
50C. Stepped Wedge using manual approach (for proportions)
51. Equivalence study (2 groups, proportions)
52A. Survival Analysis (2 group comparison with estimated probabilities)
52B. Survival Analysis (2 group comparison with preliminary data)
53. Cross Over Design (2 groups, 2 time periods, means)
54A. Linear Mixed Effects Model: manual approach (1 fixed effect predictor; simulate data)
54B. Linear Mixed Effects Model: using SIMR package (1 fixed effect predictor; simulate data)
54C. Linear Mixed Effects Model: using SIMR package (2 or more fixed effect predictors; pilot data)
Exact Tests
1. Correlation: Bivariate normal model (Pearson r for two continuous variables)
2. Linear Multiple Regression: Random Model
3. Proportion: Difference from Constant (one-sample, binomial test)
4. Proportions: Inequality, 2 Dependent Groups (McNemar's test)
5. Proportions: Inequality, 2 Independent Groups (Fisher’s Exact test)
8. Proportion: sign test (Binomial test)
T-tests
10. Correlation: Point Biserial Model (one continuous and one dichotomous variable)
15. Means: Difference between 2 dependent means (matched/paired samples t-test)
16. Means: Difference between 2 independent means (between/independent samples t-test)
17. Means: Difference from constant (one sample t-test)
20. Means: Wilcoxon-Mann-Whitney test (Wilcoxon Rank-Sum or MWU test)
Chi-Square Tests
22. Goodness of Fit tests: Contingency Tables
Z-Tests
28. Correlation: 2 Independent Pearson r’s
29A. Logistic Regression (binary logistic regression with a continuous predictor)
29B. Logistic Regression (binary logistic regression with a dichotomous predictor)
30A. Poisson Regression (with a continuous predictor)
30B. Poisson Regression (with a dichotomous predictor)
31. Proportions: Difference between 2 independent proportions
F-Tests
34. ANOVA: Fixed Effects, omnibus, one-way (1-way ANOVA test of a between/fixed effects variable)
35. ANOVA: Fixed Effects, special main effects and interactions (2-way ANOVA)
36. ANOVA: Repeated measures, between factors
37. ANOVA: Repeated measures, within factors
38. ANOVA: Repeated measures, within-between variables interaction
39. Hotelling’s T-squared : One group mean vector (within group Hotelling’s T-squared test)
40. Hotelling’s T-squared: Two group mean vectors (between groups Hotelling’s T-squared test)
41. MANOVA: Global Effects
42. MANOVA: Special Effects and Interactions
46. Linear Multiple Regression: Fixed Model, R2 deviation from zero
RPower - Advanced power and sample size calculations that can be done in the R statistical platform.
50A. Stepped Wedge using SWSamp (for means)
50B. Stepped Wedge using manual approach (for means)
50C. Stepped Wedge using manual approach (for proportions)
51. Equivalence study (2 groups, proportions)
52A. Survival Analysis (2 group comparison with estimated probabilities)
52B. Survival Analysis (2 group comparison with preliminary data)
53. Cross Over Design (2 groups, 2 time periods, means)
54A. Linear Mixed Effects Model: manual approach (1 fixed effect predictor; simulate data)
54B. Linear Mixed Effects Model: using SIMR package (1 fixed effect predictor; simulate data)
54C. Linear Mixed Effects Model: using SIMR package (2 or more fixed effect predictors; pilot data)
Exact Tests
The main characteristic of exact methods is that the statistical tests are based on exact probability statements that are valid for any sample size, thus you may use exact power calculations for any sample size. These calculations should be used when sample sizes are small and/or there is no equality of variance.
1. Correlation: Bivariate (2 continuous variables) Normal Model
Test whether an r value is statistically different from zero or another r value.
Example: Is a correlation of 0.70 between hours studied and test score significantly different from zero? Or, does my sample's r value of 0.70 differ from a population's r value of 0.50?
Tails = 1 or 2 (use 2-tail if the r value could be pos or neg; otherwise use 1-tail)
Correlation p H1 (corr. value assuming H1) = 0.70 (note that r is the effect size)
Alpha = .05 (or .01)
Power = desired level (usually 0.80 [80%] or higher)
Correlation p Ho (corr. value assuming Ho) = usually 0. However, you may enter any other r value if you want to compare a known null hypothesis population r value (e.g., 0.50) against your expected sample's r value (0.70).
Click here for video.
The main characteristic of exact methods is that the statistical tests are based on exact probability statements that are valid for any sample size, thus you may use exact power calculations for any sample size. These calculations should be used when sample sizes are small and/or there is no equality of variance.
1. Correlation: Bivariate (2 continuous variables) Normal Model
Test whether an r value is statistically different from zero or another r value.
Example: Is a correlation of 0.70 between hours studied and test score significantly different from zero? Or, does my sample's r value of 0.70 differ from a population's r value of 0.50?
Tails = 1 or 2 (use 2-tail if the r value could be pos or neg; otherwise use 1-tail)
Correlation p H1 (corr. value assuming H1) = 0.70 (note that r is the effect size)
Alpha = .05 (or .01)
Power = desired level (usually 0.80 [80%] or higher)
Correlation p Ho (corr. value assuming Ho) = usually 0. However, you may enter any other r value if you want to compare a known null hypothesis population r value (e.g., 0.50) against your expected sample's r value (0.70).
Click here for video.
2. Linear Multiple Regression: Random Model (see also #46 below)
To test whether a group of predictors significantly predicts an outcome variable.
Example: Will a model with 4 predictors significantly predict an outcome variable?
Tails = 1 or 2
H1 p2 = click “Determine” to estimate the population multiple correlation coefficient. Choose “from predictor correlations." Enter number of predictors. Click on “specify matrices” and enter IV’s correlations with the outcome, and outcome. Select "Corr between predictors" tab and enter expected correlations between predictor variables. Accept values, calculate H1p2 value, calculate and transfer to main window, then close side window.
H0 p2 = null hypothesis multiple correlation coefficient (usually 0)
α err prob: desired alpha level of 0.05 or 0.01 (probability of a type I error)
Power = enter desired power level
Number of predictors = enter number of predictor variables. 4 in this example.
Click here for video.
3. Proportion: Difference from Constant (binomial test, one sample case)
To test whether a sample proportion differs from a population proportion. Especially use the exact test when n*po*qo < 5 or n*po and n*(1-po) < 5, (where po equals the probability of an event occuring and qo equals the probability of an event not occuring.)
Example: The prevalence of breast cancer among middle aged women in the general population is .02. The breast cancer rate among a sample of women who have a sister with breast cancer is .05. What sample size is needed to detect a significant difference between the population and sample proportions? (To claim that the rate of cancer among women with sister history is 2.5 times [.05/.02] higher than those without sister history?)
Tails = 2 or 1 (if the direction of difference from the Ho(P1) value is known, choose a 1-tail test)
Effect size g = Click Determine. Enter P1 the Ho prop(.02), and P2 the H1(alternative) prop(.05). Choose one of the "Calc P2 from..." techniques (they all give the same effect size), synch the values, and then calculate effect size g. In this case, g = 0.03
Alpha = .05
Power = .90
Constant proportion = Ho prop (.02) which is the same as P1
Click here for video.
4. Proportions: Inequality, 2 Dependent Groups (McNemar’s)
Compare 2 dependent proportions (people in both groups have been paired/matched).
Example: We test the speed of the innate immune system response while patients are not taking daily doses of Vitamin D. Then we test the speed of the innate immune system response in the same patients when they are taking daily doses of Vitamin D. The speed of the innate immune response is binary, either good or poor. How many patients are required to detect a significant difference in response rate? Here are the expected proportions for our patients.
Vitamin D
good poor
No Vitamin D good 0.70 0.02
poor 0.13 0.15
Agreement occurs when both methods produce the same result in the same patients. There is agreement when both methods produce a good immune response (0.70 or 70%), and when both methods produce a poor immune response (0.15 or 15%). Discordance occurs when the methods produce different results (good & poor) in the same patients. In this example, there are a total of 0.02 + 0.13 = 0.15 (15%) discordant pairs. McNemar's focuses on these discordant pairs.
Tails = 1 or 2 (use 1-tail if the difference is expected to go in one direction).
OR = calculate the quotient of the discordant pairs: 0.02 / 0.13 = 0.1538 (Note: The opposite quotient 0.13 / 0.02 = 6.5 produces the same result.)
Alpha = 0.05 or 0.01 (probability of a type I error)
Power = desired level
Prop. Discordant Pairs = total proportion of expected discordant pairs is 0.02 + 0.13 = 0.15 (15%)
Compare 2 dependent proportions (people in both groups have been paired/matched).
Example: We test the speed of the innate immune system response while patients are not taking daily doses of Vitamin D. Then we test the speed of the innate immune system response in the same patients when they are taking daily doses of Vitamin D. The speed of the innate immune response is binary, either good or poor. How many patients are required to detect a significant difference in response rate? Here are the expected proportions for our patients.
Vitamin D
good poor
No Vitamin D good 0.70 0.02
poor 0.13 0.15
Agreement occurs when both methods produce the same result in the same patients. There is agreement when both methods produce a good immune response (0.70 or 70%), and when both methods produce a poor immune response (0.15 or 15%). Discordance occurs when the methods produce different results (good & poor) in the same patients. In this example, there are a total of 0.02 + 0.13 = 0.15 (15%) discordant pairs. McNemar's focuses on these discordant pairs.
Tails = 1 or 2 (use 1-tail if the difference is expected to go in one direction).
OR = calculate the quotient of the discordant pairs: 0.02 / 0.13 = 0.1538 (Note: The opposite quotient 0.13 / 0.02 = 6.5 produces the same result.)
Alpha = 0.05 or 0.01 (probability of a type I error)
Power = desired level
Prop. Discordant Pairs = total proportion of expected discordant pairs is 0.02 + 0.13 = 0.15 (15%)
5. Proportions: Inequality, 2 Independent Groups (Fisher’s Exact)
Compare 2 independent proportions.
Example: Based on previous data, the expected proportion of students passing a stats course taught by psychology teachers is 0.85. The expected proportion of students passing the same stats class taught by mathematics teachers is 0.95. How many participants are needed to detect a significant difference between the 2 proportions in a prospective study? (Note that this also works with retrospective studies where one wants to know how many cases to extract from a database for both groups.)
Tail = 1 or 2
Prop 1 = 0.85 (You do not have to click on “Determine”)
Prop 2 = 0.95
Alpha = .05
Power = choose your level.
Click here for video.
Compare 2 independent proportions.
Example: Based on previous data, the expected proportion of students passing a stats course taught by psychology teachers is 0.85. The expected proportion of students passing the same stats class taught by mathematics teachers is 0.95. How many participants are needed to detect a significant difference between the 2 proportions in a prospective study? (Note that this also works with retrospective studies where one wants to know how many cases to extract from a database for both groups.)
Tail = 1 or 2
Prop 1 = 0.85 (You do not have to click on “Determine”)
Prop 2 = 0.95
Alpha = .05
Power = choose your level.
Click here for video.
6. Proportions: Inequality, 2 independent groups (unconditional)
Not sure about this one. It uses one proportion and an OR.
7. Proportions: Inequality, (offset) 2 independent groups (unconditional)
Not sure about this one. It uses conditional probabilities.
Not sure about this one. It uses one proportion and an OR.
7. Proportions: Inequality, (offset) 2 independent groups (unconditional)
Not sure about this one. It uses conditional probabilities.
8. Proportion: sign test (binomial test)
(Test whether a single sample proportion is different from a null hypothesized proportion of 0.50 [50%]. This calculation can also be done with method #3 by specifying a null proportion equal to 0.50)
Example: How many students are needed to show that a new teaching technique results in significantly more than 50% of students passing a course? Let’s say we expect that 70% of students will pass the course with the new teaching technique.
Tails = 1 or 2 (select 2 if the difference can be more or less than 50%)
Effect size g = Calculate the expected effect size where g = (expected proportion – 0.50). So in this example, g = (0.70 – 0.50) = 0.20
Alpha = 0.05 or 0.01 (type I error rate)
Power = select a level usually 0.80 or higher
Click here for video.
9. Generic Binomial Test
T-Tests
Cohen’s Effect Size Conventions for “d”
d = 0.20 (small)
d = 0.50 (medium)
d = 0.80 + (large)
Cohen’s Effect Size Conventions for “d”
d = 0.20 (small)
d = 0.50 (medium)
d = 0.80 + (large)
10. Correlation: Point Biserial Model
(Tests whether a correlation coefficient is significantly different from zero when one variable is continuous and the other is dichotomous.)
Example: How many participants are needed to determine whether an expected r = 0.30 is significantly different from zero when correlating test scores (continuous) and gender (dichotomous)?
Tails = 1 or 2 (use 2-tail if the correlation could be pos or neg; otherwise use 1-tail)
Effect Size |p| = Click Determine. Square the expected correlation coefficient (r^2 = 0.30^2 = 0.09) and insert it in the window next to “Coefficient of Determination p^2”. Click “Calculate” and your expected correlation coefficient (r = 0.30) will appear it the window next to “Effect Size |p|”. Calculate and transfer the effect size to the main window.
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = choose your power (usually 80% or higher)
Click here for video.
11. Linear Bivariate Regression: one group, size of slope
Determine whether the slope for a predictor variable is significantly different from 0.
12. Linear Bivariate Regression: 2 groups, difference between intercepts
13. Linear Bivariate Regression: 2 groups, difference between slopes
14. Linear Multiple Regression: Fixed model, single regression coefficient
Determine whether the slope for a predictor variable is significantly different from 0.
12. Linear Bivariate Regression: 2 groups, difference between intercepts
13. Linear Bivariate Regression: 2 groups, difference between slopes
14. Linear Multiple Regression: Fixed model, single regression coefficient
15. Means: Difference between 2 dependent groups
(Compute power and sample size for a paired [within, correlated, dependent] or matched samples t-test.)
Example: How many participants are needed to detect a change in mood states for participants who are given an antidepressant? The expected average mood state in participants prior to taking an antidepressant is 46 (expected standard deviation = 5.1). The expected average mood state in the same participants after taking an antidepressant is 50 (expected standard deviation = 5.8). (Note: in a matched pair design where the participants in both groups are different, but they are paired on certain characteristics, the procedure is the same).
Tails = 1 or 2 tails (choose 2 tail if the post-treatment score could be higher or lower than the pre-treatment score, otherwise choose 1 tail).
Effect Size dz = Click “determine” and select “From group parameters”.
Enter expected means for groups 1 (46) and 2 (50).
Enter expected standard deviations for groups 1 (5.1) and 2 (5.8).
Enter expected correlation between the 2 sets of data. Should be moderate to high in a paired samples scenario (e.g., 0.5).
Calculate effect size and transfer to main window.
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select a desired level (usually 80% or higher)
Click here for video.
16. Means: Difference between 2 independent groups
(Independent [between] groups t-test)
Example: What sample size is needed to find a significant difference between control and treatment groups? The expected mean for the control group is 100. The expected mean for the treatment group is 120. The expected standard deviation (SD) for both groups is 15.
Tails = 1 or 2
Effect size d = Click “Determine.” There are 2 methods.
Method 1: If the sample sizes in your treatment and control groups are not expected to be the same (unbalanced design), select the “n1 != n2” method. Enter the expected mean for group 1 (control mean = 100). Enter the expected mean for group 2 (treatment group mean = 120). Enter the expected “SD within each group” (SD = 15 for this example). Note that there is only one SD value to enter because the SDs (standard deviations) should be roughly the same for both groups (i.e., the t-test assumes that the variances [squared SDs] from both groups are equivalent). If you have preliminary data then you could compute the pooled variance for both groups, take the square root, and enter the pooled SD value. Click “Calculate and transfer to main window” then close.
Method 2: If the sample sizes in both groups are expected to be the same (balanced design), select the “n1 = n2” method. Enter the expected mean for group 1 (control mean = 100). Enter the expected mean for group 2 (treatment group mean = 120). Enter the expected SD for group 1 (control SD = 15) and group 2 (treatment SD = 15). The t-test assumes that the variances (squared SDs) from both groups are equivalent, so if you don’t have good reason to assume that the SDs are different, enter the same SD value in both windows. If preliminary data are available you could enter the SDs from your preliminary data. In this case the SD values will likely be different. Click “Calculate and transfer to main window.”
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select a desired level (usually 80% or higher)
Allocation ratio N2/N1 = enter 1.0 if you expect the same number of participants per group. If the number of participants in both groups will not be the same, enter the ratio of sample sizes.
Click here for video.
16. Means: Difference between 2 independent groups
(Independent [between] groups t-test)
Example: What sample size is needed to find a significant difference between control and treatment groups? The expected mean for the control group is 100. The expected mean for the treatment group is 120. The expected standard deviation (SD) for both groups is 15.
Tails = 1 or 2
Effect size d = Click “Determine.” There are 2 methods.
Method 1: If the sample sizes in your treatment and control groups are not expected to be the same (unbalanced design), select the “n1 != n2” method. Enter the expected mean for group 1 (control mean = 100). Enter the expected mean for group 2 (treatment group mean = 120). Enter the expected “SD within each group” (SD = 15 for this example). Note that there is only one SD value to enter because the SDs (standard deviations) should be roughly the same for both groups (i.e., the t-test assumes that the variances [squared SDs] from both groups are equivalent). If you have preliminary data then you could compute the pooled variance for both groups, take the square root, and enter the pooled SD value. Click “Calculate and transfer to main window” then close.
Method 2: If the sample sizes in both groups are expected to be the same (balanced design), select the “n1 = n2” method. Enter the expected mean for group 1 (control mean = 100). Enter the expected mean for group 2 (treatment group mean = 120). Enter the expected SD for group 1 (control SD = 15) and group 2 (treatment SD = 15). The t-test assumes that the variances (squared SDs) from both groups are equivalent, so if you don’t have good reason to assume that the SDs are different, enter the same SD value in both windows. If preliminary data are available you could enter the SDs from your preliminary data. In this case the SD values will likely be different. Click “Calculate and transfer to main window.”
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select a desired level (usually 80% or higher)
Allocation ratio N2/N1 = enter 1.0 if you expect the same number of participants per group. If the number of participants in both groups will not be the same, enter the ratio of sample sizes.
Click here for video.
17. Means: Difference from constant (one sample case)
(Single sample t-test for comparing a sample mean against a known population mean)
Example: The expected mean IQ for a random sample of graduate students is 110. How many students are needed to detect a significant difference from the population mean IQ of 100, where the standard deviation of the IQ test is 15?
Tails = 1 or 2
Effect size d = Click “Determine.” Enter the mean for H0 (100) and the mean for H1 (110). Enter 15 for the SD (sample estimate of the population standard deviation) or σ (the known pop standard deviation).
Note: By convention people may choose to use a z-test instead of a t-test when the pop σ is known and sample size is greater than 30. If researchers know the population σ and they expect >30 participants, this t-test calculation should give a reasonable power and sample size estimation. The reason for this is that as N increases over 30, the t-distribution approximates the normal z-distribution.
Click “Calculate and transfer to main window”.
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select a desired level (usually 80% or higher)
Click here for video.
18. Wilcoxon Signed-Ranks Tests (matched pairs)
Non-parametric test for comparing 2 matched groups.
19. Wilcoxon Signed-Ranks Test (one/within sample case)
Non-parametric test for comparing within group data.
Non-parametric test for comparing 2 matched groups.
19. Wilcoxon Signed-Ranks Test (one/within sample case)
Non-parametric test for comparing within group data.
20. Wilcoxon Rank-Sum or MWU (2 independent groups)
(Non-parametric test for comparing 2 independent groups. This option is for continuous data that do not satisfy parametric assumptions. This option is not for ordinal data. Basically the same power and sample size calculation as 2-groups t-test except that with this function the required sample size will usually be slightly higher.)
Example. Continuous test scores from two groups of participants will be compared with the MWU, Wilcoxon rank sum test to determine if the two groups differ in test performance.
Tail(s) = Select one or two
Parent Distribution = Choose a distribution
Determine (effect size) = Select n1 = n2 if sample sizes are expected to be equivalent. Select n1! = n2 if the sample sizes are not expected to be equivalent. Enter expected means and SDs for both groups. Click "Calculate and transfer to main window."
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select a desired level (usually 80% or higher)
Allocation ratio N2/N1 = enter 1.0 if you expect the same number of participants per group. If the number of participants in both groups will not be the same, enter the ratio of sample sizes.
(Non-parametric test for comparing 2 independent groups. This option is for continuous data that do not satisfy parametric assumptions. This option is not for ordinal data. Basically the same power and sample size calculation as 2-groups t-test except that with this function the required sample size will usually be slightly higher.)
Example. Continuous test scores from two groups of participants will be compared with the MWU, Wilcoxon rank sum test to determine if the two groups differ in test performance.
Tail(s) = Select one or two
Parent Distribution = Choose a distribution
Determine (effect size) = Select n1 = n2 if sample sizes are expected to be equivalent. Select n1! = n2 if the sample sizes are not expected to be equivalent. Enter expected means and SDs for both groups. Click "Calculate and transfer to main window."
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select a desired level (usually 80% or higher)
Allocation ratio N2/N1 = enter 1.0 if you expect the same number of participants per group. If the number of participants in both groups will not be the same, enter the ratio of sample sizes.
21. Generic t-test
No a priori calculations
No a priori calculations
Chi-Square Tests
(GPower does not offer chi-square crosstabs, test for independence)
Cohen’s Effect Size Conventions for “w”
w = 0.10 (small)
w = 0.30 (medium)
w = 0.50 (large)
(GPower does not offer chi-square crosstabs, test for independence)
Cohen’s Effect Size Conventions for “w”
w = 0.10 (small)
w = 0.30 (medium)
w = 0.50 (large)
22. Goodness-of-Fit tests: Contingency Tables
(Chi-Square test for Goodness of Fit. Null hypothesis is that observed frequencies equal expected frequencies)
Example: Do the observed number of people belonging to the following religious categories (A) Atheist, (B) Buddhist, (C) Christian, and (D) Don’t Affiliate differ from the expected number of people belonging to each religious category?
Effect Size w = Click “Determine.” Number of cells refers to # of categories, in this case 4 religious categories (A, B, C, D).
- P(Ho) is the column of expected proportions belonging to each category. Enter expected proportions for A(1), B(2), C(3), and D(4). We’ll assume that the expected proportions are equal for all groups, so click “Equal p[Ho]”. With 4 groups the expected proportions will be 0.25. If the expected proportions are not equal, enter different expected proportions for each category. Expected proportions should sum to 1.0 (100%).
- P(H1) is the column of observed proportions computed from the data collected in the study. Enter observed proportions for A(1), B(2), C(3), and D(4). Enter actual frequencies then click “Normalize p(H1)” or enter the observed proportions directly. We’ll assume the following proportions were observed in this study: A(0.10), B(0.20), C(0.30), D(0.40). Observed proportions should sum to 1.0 (100%).
- “Auto calc. last cell” computes final proportion for the last cell in a column (so the proportions add up to 1.0) in case you don’t want to compute the last proportion.
- Click “Calculate and transfer to main window”.
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select a desired level (usually 80% or higher)
Df = Degrees of freedom is number of categories minus 1. In this example 4–1 = 3.
Click here for video.
(Chi-Square test for Goodness of Fit. Null hypothesis is that observed frequencies equal expected frequencies)
Example: Do the observed number of people belonging to the following religious categories (A) Atheist, (B) Buddhist, (C) Christian, and (D) Don’t Affiliate differ from the expected number of people belonging to each religious category?
Effect Size w = Click “Determine.” Number of cells refers to # of categories, in this case 4 religious categories (A, B, C, D).
- P(Ho) is the column of expected proportions belonging to each category. Enter expected proportions for A(1), B(2), C(3), and D(4). We’ll assume that the expected proportions are equal for all groups, so click “Equal p[Ho]”. With 4 groups the expected proportions will be 0.25. If the expected proportions are not equal, enter different expected proportions for each category. Expected proportions should sum to 1.0 (100%).
- P(H1) is the column of observed proportions computed from the data collected in the study. Enter observed proportions for A(1), B(2), C(3), and D(4). Enter actual frequencies then click “Normalize p(H1)” or enter the observed proportions directly. We’ll assume the following proportions were observed in this study: A(0.10), B(0.20), C(0.30), D(0.40). Observed proportions should sum to 1.0 (100%).
- “Auto calc. last cell” computes final proportion for the last cell in a column (so the proportions add up to 1.0) in case you don’t want to compute the last proportion.
- Click “Calculate and transfer to main window”.
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select a desired level (usually 80% or higher)
Df = Degrees of freedom is number of categories minus 1. In this example 4–1 = 3.
Click here for video.
23. Variance: difference from constant (one case)
Not sure about this one
24. Generic X2 Test
No a priori calculations
Not sure about this one
24. Generic X2 Test
No a priori calculations
Z Tests
Effect size conventions for the correlation coefficient r.
r = small 0.10
r = medium 0.30
r = large 0.50
25. Correlation: Tetrachoric Model
Correlate 2 artificially dichotomized variables
26. Correlation: 2 Dependent Pearson r’s (common index)
Correlate Pearson correlation coefficients from 2 dependent samples.
27. Correlation: 2 Dependent Pearson r’s (no common index)
(not sure how this differs from #2)
Effect size conventions for the correlation coefficient r.
r = small 0.10
r = medium 0.30
r = large 0.50
25. Correlation: Tetrachoric Model
Correlate 2 artificially dichotomized variables
26. Correlation: 2 Dependent Pearson r’s (common index)
Correlate Pearson correlation coefficients from 2 dependent samples.
27. Correlation: 2 Dependent Pearson r’s (no common index)
(not sure how this differs from #2)
28. Correlation: 2 independent Pearson r’s
(Compare 2 Pearson correlation coefficients for 2 independent samples)
Example: Test whether the correlation between hours studied and test score for group A is statistically different than the correlation between hours studied and test score for group B.
Tails = 1 or 2
Effect size q = click ‘determine’ and enter expected population correlations:
Correlation coefficient p1: Group A correlation is 0.54 for this example.
Correlation coefficient p2: Group B correlation is 0.40 for this example.
Click “Calculate and transfer to main window”
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select desired level (usually 80% or higher)
Allocation ratio n2/n1 = enter ratio of participants: Group B/Group A (value should be 1.0 for balanced designs)
Click here for video.
(Compare 2 Pearson correlation coefficients for 2 independent samples)
Example: Test whether the correlation between hours studied and test score for group A is statistically different than the correlation between hours studied and test score for group B.
Tails = 1 or 2
Effect size q = click ‘determine’ and enter expected population correlations:
Correlation coefficient p1: Group A correlation is 0.54 for this example.
Correlation coefficient p2: Group B correlation is 0.40 for this example.
Click “Calculate and transfer to main window”
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select desired level (usually 80% or higher)
Allocation ratio n2/n1 = enter ratio of participants: Group B/Group A (value should be 1.0 for balanced designs)
Click here for video.
29A. Logistic Regression for a continuous predictor
(Test whether a continuous predictor is a significant predictor of a binary outcome, with or without other covariates)
Example: Test whether body mass index (BMI) influences mortality (yes 1, no 0). This example includes what to do if covariates are included to control/account for the influence of other variables on the outcome.
Tails: 1 or 2
Click on Options tab at the bottom. There are 2 options for entering expected effect size. 1st option: Odds ratio. 2nd option: Two probabilities.
Two probabilities option
Pr(Y=1 | X=1) H1 = What is the probability of death (Y=1) when the main predictor (BMI) is one standard deviation (SD) unit (i.e., one z-score) above its mean, and all other covariates, if applicable, are set to their mean values. For this example the expected mean BMI = 30 and the expected SD = 3.0. So what is the probability of death for our sample when the BMI is 33? Let’s say that the probability of death = 0.25.
Prob(Y=1 | X=1) Ho = What is the probability of death (Y=1) when the main predictor (BMI) is at the mean, and all other covariates, if applicable, are set to their mean values. For this example the expected mean BMI = 30 and the expected SD = 3.0. So what is the probability of death for our sample when the BMI is 30? Let's say that the probability of death = 0.15.
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select desired level (usually 80% or higher)
R-squared other X = Enter the expected squared multiple correlation coefficient (R-squared) between the main predictor variable (BMI) and all other covariates. R-squared represents the amount of variability in the main predictor (BMI) that is accounted for by the covariates. (If there are no other covariates, enter 0). The R-squared value can be found by regressing the main predictor onto data for all other covariates. In most cases the R-squared value must be estimated. In the current example if there are two covariates that are expected to have a low association with BMI (say R= 0.20), then enter 0.20^2 = 0.04. If the covariates are expected to have a moderate association with BMI (say R = 0.50), then enter 0.50^2 = 0.25. If the covariates are expected to have a strong association with BMI (say R = 0.90), then enter 0.90^2 = 0.81.
X-Distribution = select normal unless there are reasons to think that the main predictor is distributed differently.
X param mu = the z-score population mean of predictor X (BMI) = 0.
X param sigma = the z-score population SD of predictor X (BMI) = 1.
Odds ratio option (produces the same results as the 2 probability option above)
Click “Determine” and enter values for Pr(Y=1 | X=1) H1 and Pr(Y=1 | X=1) H0. These two values are explained above in the “two probabilities option.”
Click “calculate and transfer to main window.”
Gpower computes the odds ratio from the probabilities using the following formula: (p1(1-p2))/(p2(1-p1)). For this example (see the two probabilities option above) the OR = 1.89. The Pr(Y=1 | X=1) H0 = 0.15.
The rest of the steps are the same as the two probabilities option.
Click here for video.
(Test whether a continuous predictor is a significant predictor of a binary outcome, with or without other covariates)
Example: Test whether body mass index (BMI) influences mortality (yes 1, no 0). This example includes what to do if covariates are included to control/account for the influence of other variables on the outcome.
Tails: 1 or 2
Click on Options tab at the bottom. There are 2 options for entering expected effect size. 1st option: Odds ratio. 2nd option: Two probabilities.
Two probabilities option
Pr(Y=1 | X=1) H1 = What is the probability of death (Y=1) when the main predictor (BMI) is one standard deviation (SD) unit (i.e., one z-score) above its mean, and all other covariates, if applicable, are set to their mean values. For this example the expected mean BMI = 30 and the expected SD = 3.0. So what is the probability of death for our sample when the BMI is 33? Let’s say that the probability of death = 0.25.
Prob(Y=1 | X=1) Ho = What is the probability of death (Y=1) when the main predictor (BMI) is at the mean, and all other covariates, if applicable, are set to their mean values. For this example the expected mean BMI = 30 and the expected SD = 3.0. So what is the probability of death for our sample when the BMI is 30? Let's say that the probability of death = 0.15.
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select desired level (usually 80% or higher)
R-squared other X = Enter the expected squared multiple correlation coefficient (R-squared) between the main predictor variable (BMI) and all other covariates. R-squared represents the amount of variability in the main predictor (BMI) that is accounted for by the covariates. (If there are no other covariates, enter 0). The R-squared value can be found by regressing the main predictor onto data for all other covariates. In most cases the R-squared value must be estimated. In the current example if there are two covariates that are expected to have a low association with BMI (say R= 0.20), then enter 0.20^2 = 0.04. If the covariates are expected to have a moderate association with BMI (say R = 0.50), then enter 0.50^2 = 0.25. If the covariates are expected to have a strong association with BMI (say R = 0.90), then enter 0.90^2 = 0.81.
X-Distribution = select normal unless there are reasons to think that the main predictor is distributed differently.
X param mu = the z-score population mean of predictor X (BMI) = 0.
X param sigma = the z-score population SD of predictor X (BMI) = 1.
Odds ratio option (produces the same results as the 2 probability option above)
Click “Determine” and enter values for Pr(Y=1 | X=1) H1 and Pr(Y=1 | X=1) H0. These two values are explained above in the “two probabilities option.”
Click “calculate and transfer to main window.”
Gpower computes the odds ratio from the probabilities using the following formula: (p1(1-p2))/(p2(1-p1)). For this example (see the two probabilities option above) the OR = 1.89. The Pr(Y=1 | X=1) H0 = 0.15.
The rest of the steps are the same as the two probabilities option.
Click here for video.
29B. Logistic Regression for a dichotomous predictor
(Test whether a dichotomous variable is a significant predictor of a binary outcome, with or without other covariates)
Example: Test whether smoking (yes vs. no) influences mortality. This example includes what to do if covariates are included to control/account for the influence of other variables on the outcome.
Tails = 1 or 2
Click on Options tab at the bottom. There are 2 options for entering expected effect size. 1st option: Odds ratio. 2nd option: Two probabilities.
Two probabilities option
Pr(Y=1 | X=1) H1 = What is the probability of death (Y=1) when someone is a smoker (X=1)? For this example let’s assume that the probability of death for patients that smoke is P1 = 0.18.
Prob(Y=1 | X=1) Ho = What is the probability of death (Y=1) when someone is a nonsmoker? [It might help to think of the X=1 in terms of X=0 for non-smoker]. For this example, let’s assume that the probability of death for patients that do not smoke is P0 = 0.06.
α err prob = choose a Type I error rate (0.05 or 0.01)
Power = select desired level (usually 80% or higher)
R-squared other X = Enter the expected squared multiple correlation coefficient (R-squared) between the main categorical predictor (smoking status) and all other covariates. R-squared represents the amount of variability in the main predictor (smoker status) that is accounted for by the covariates. (If there are no other covariates, enter 0). The R-squared value can be found by regressing the main predictor onto all other covariates using binary logistic regression. In most cases, however, the R-squared value can be estimated. In the current example if there are covariates that are expected to have a low association with smoking status (say R= 0.20), enter 0.20^2 = 0.04. If the covariates are expected to have a moderate association with smoking status (say R = 0.50), enter 0.50^2 = 0.25. If the covariates are expected to have a strong association with smoking status (say R = 0.90), enter 0.90^2 = 0.81, etc.
X-Distribution = enter “binomial” for the binomial predictor
X param ∏ = The proportion of cases who are smokers (X=1). If the number of smokers (X=1) and nonsmokers (X=0) is expected to be equal/balanced, enter 0.50. If 75% of the cases will be smokers, enter 0.75. If the number of smokers will be 45% of the total, enter 0.45, etc.
Odds ratio option (produces the same results as the 2 probability option above)
Click “Determine” and enter values for Pr(Y=1 | X=1) H1 and Pr(Y=1 | X=1) H0. These two values are explained above in the “two probabilities option.”
Click “calculate and transfer to main window.”
Gpower computes the odds ratio from the probabilities using the following formula: (p1(1-p2))/(p2(1-p1)). For this example (see the two probabilities option above) the OR = 3.44. The Pr(Y=1 | X=1) H0 = 0.06.
The rest of the steps are the same as the two probabilities option.
Click here for video.
30.A. Poisson Regression for a continuous predictor
(Find out whether a continuous predictor variable influences the rate of events over a set period of time, with or without covariates. Note that Poisson regression assumes independence of observations which is that the occurrence or absence of a previous event should not influence whether another event will occur. Subjects can have multiple events, as long as they are independent.)
Example: Scenario 1: Study whether a change in drug dose increases the rate of preferred events.
Scenario 2: Study whether a change in drug dose decreases the rate of adverse events.
Tails = 1 or 2
Exp(B1) = Change in response rate with a 1-unit increase in the predictor variable (drug dose). (Note that more drastic expected changes in response rate increase the likelihood of significant results and lower the required sample size.)
Scenario 1: If we expect an increase in the response rate as a result of an increase in drug dose, enter the expected response rate increase above the base rate [Exp(Bo)] when there is a one unit increase in the predictor variable [information on Exp(Bo) shown below]. For example, if we expect a 25% response rate increase for every 1 mg increase in a drug dose, enter 1.25.
Scenario 2: If we anticipate a decrease in the response rate as a result of an increase in drug dose, enter the expected response rate decrease below the base rate [Exp(Bo)] when there is a one unit increase in the predictor variable. For example, if we expect a 20% response rate decrease in adverse events for every 1 mg increase in a drug dose, enter 0.80.
α err prob = choose a Type I error rate (0.05 or 0.01).
Power = select our level of power (usually 80% or higher).
Base rate Exp(Bo) = The base rate is the response rate we expect when there is no intervention. Finding the base rate is broken down into 3 steps.
1. The first step in selecting a baseline rate is to pick a unit of exposure. The unit of exposure is the window through which we will count events. Units of exposure may be time, distance, area, volume, sample size, and space. In healthcare studies the unit of exposure is often number of patient days.
2. The second step is to pick a size or length for our unit of exposure. If the unit of exposure is time, we may decide to use 30 days. If the unit of exposure is patient days, we may decide to use 100 patient days or 1000 patient days. (Note: this is not necessarily the duration of the study, just the denominator value for figuring a base rate.) Rare events may require longer units of exposure.
3. The third step is to choose the number of events that we expect per unit of exposure for persons who are not exposed to the continuous predictor (i.e., control subjects, general population). The number of expected events per unit of exposure is put over the length of unit of exposure (from step 2) and the ratio calculated. This ratio is our ‘Base rate Exp(Bo)’ value. For example, if we think there are about 13 events per 30 days in control subjects or the general population, the baseline rate will be 13/30 = 0.433. If we think there are 195 events per 1000 patient days in control subjects, the baseline rate will be 195/1000 = 0.195. (Note that the base rate can refer to death rate, survival rate, accident rate, or hazard rate etc.)
Mean exposure = enter a unit of exposure value which represents how long we want the study to last. Consider the following scenario:
Unit of exposure is ‘days’
Exp[B1] = 1.25, representing a 25% rate increase over the base rate
α err prob = 0.05
Power = 0.80
Base rate (Exp[Bo]) = 13events/30days = 0.433
R^2 other X = 0
X distribution = Normal
X parm μ = 0
X parm σ = 1
Here are various Mean Exposure settings and how they change the sample size for the above scenario. (Note: the following study durations produce the same number of participant observation days.)
- If we want the required sample size for a study lasting one day, enter 1. We will need 280 participants.
(1day x 280 = 280 participant observation days)
- If we want the required sample size for a study lasting 4 days, enter 4. We will need 70 participants.
(4days x 70 = 280 participant observation days)
- If we want the required sample size for a study lasting 10 days, enter 10. We will need 28 participants.
(10days x 28 = 280 participant observation days)
- And if we want the required sample size for a study lasting 28 days, enter 28. We will need 10 participants.
(28days x 10 = 280 participant observation days)
R^2 other X = enter the expected squared multiple correlation (R-squared) between the main continuous predictor variable and other covariates. If there are no covariates, enter 0. If there are covariates and they are expected to have a low association with the main predictor (say R= 0.20), enter 0.20^2 = 0.04. If the covariates are expected to have a moderate association with the main predictor (say R = 0.50), enter 0.50^2 = 0.25. If the covariates are expected to have a strong association with the main predictor (say R = 0.90), enter 0.90^2 = 0.81, etc.
X Distribution = usually normal when the predictor is continuous. This represents the shape of the probability distribution for the main predictor variable.
X parm mu = 0
X parm sigma = 1
Click here for video.
(Find out whether a continuous predictor variable influences the rate of events over a set period of time, with or without covariates. Note that Poisson regression assumes independence of observations which is that the occurrence or absence of a previous event should not influence whether another event will occur. Subjects can have multiple events, as long as they are independent.)
Example: Scenario 1: Study whether a change in drug dose increases the rate of preferred events.
Scenario 2: Study whether a change in drug dose decreases the rate of adverse events.
Tails = 1 or 2
Exp(B1) = Change in response rate with a 1-unit increase in the predictor variable (drug dose). (Note that more drastic expected changes in response rate increase the likelihood of significant results and lower the required sample size.)
Scenario 1: If we expect an increase in the response rate as a result of an increase in drug dose, enter the expected response rate increase above the base rate [Exp(Bo)] when there is a one unit increase in the predictor variable [information on Exp(Bo) shown below]. For example, if we expect a 25% response rate increase for every 1 mg increase in a drug dose, enter 1.25.
Scenario 2: If we anticipate a decrease in the response rate as a result of an increase in drug dose, enter the expected response rate decrease below the base rate [Exp(Bo)] when there is a one unit increase in the predictor variable. For example, if we expect a 20% response rate decrease in adverse events for every 1 mg increase in a drug dose, enter 0.80.
α err prob = choose a Type I error rate (0.05 or 0.01).
Power = select our level of power (usually 80% or higher).
Base rate Exp(Bo) = The base rate is the response rate we expect when there is no intervention. Finding the base rate is broken down into 3 steps.
1. The first step in selecting a baseline rate is to pick a unit of exposure. The unit of exposure is the window through which we will count events. Units of exposure may be time, distance, area, volume, sample size, and space. In healthcare studies the unit of exposure is often number of patient days.
2. The second step is to pick a size or length for our unit of exposure. If the unit of exposure is time, we may decide to use 30 days. If the unit of exposure is patient days, we may decide to use 100 patient days or 1000 patient days. (Note: this is not necessarily the duration of the study, just the denominator value for figuring a base rate.) Rare events may require longer units of exposure.
3. The third step is to choose the number of events that we expect per unit of exposure for persons who are not exposed to the continuous predictor (i.e., control subjects, general population). The number of expected events per unit of exposure is put over the length of unit of exposure (from step 2) and the ratio calculated. This ratio is our ‘Base rate Exp(Bo)’ value. For example, if we think there are about 13 events per 30 days in control subjects or the general population, the baseline rate will be 13/30 = 0.433. If we think there are 195 events per 1000 patient days in control subjects, the baseline rate will be 195/1000 = 0.195. (Note that the base rate can refer to death rate, survival rate, accident rate, or hazard rate etc.)
Mean exposure = enter a unit of exposure value which represents how long we want the study to last. Consider the following scenario:
Unit of exposure is ‘days’
Exp[B1] = 1.25, representing a 25% rate increase over the base rate
α err prob = 0.05
Power = 0.80
Base rate (Exp[Bo]) = 13events/30days = 0.433
R^2 other X = 0
X distribution = Normal
X parm μ = 0
X parm σ = 1
Here are various Mean Exposure settings and how they change the sample size for the above scenario. (Note: the following study durations produce the same number of participant observation days.)
- If we want the required sample size for a study lasting one day, enter 1. We will need 280 participants.
(1day x 280 = 280 participant observation days)
- If we want the required sample size for a study lasting 4 days, enter 4. We will need 70 participants.
(4days x 70 = 280 participant observation days)
- If we want the required sample size for a study lasting 10 days, enter 10. We will need 28 participants.
(10days x 28 = 280 participant observation days)
- And if we want the required sample size for a study lasting 28 days, enter 28. We will need 10 participants.
(28days x 10 = 280 participant observation days)
R^2 other X = enter the expected squared multiple correlation (R-squared) between the main continuous predictor variable and other covariates. If there are no covariates, enter 0. If there are covariates and they are expected to have a low association with the main predictor (say R= 0.20), enter 0.20^2 = 0.04. If the covariates are expected to have a moderate association with the main predictor (say R = 0.50), enter 0.50^2 = 0.25. If the covariates are expected to have a strong association with the main predictor (say R = 0.90), enter 0.90^2 = 0.81, etc.
X Distribution = usually normal when the predictor is continuous. This represents the shape of the probability distribution for the main predictor variable.
X parm mu = 0
X parm sigma = 1
Click here for video.
30B. Poisson Regression for a dichotomous predictor
(Find out whether a dichotomous predictor variable influences the rate of events over a set period of time, with or without covariates. Note that Poisson regression assumes independence of observations which is that the occurrence or absence of a previous event should not influence whether another event will occur. Subjects can have multiple events, as long as they are independent.)
Example: Does treatment (yes vs. no) increase a response rate? Or does treatment (yes vs. no) decrease a response rate?
Tails = 1 or 2
Exp(B1) = Enter the increase (or decrease) in the response rate beyond the base rate [Exp(Bo)] that you want to detect when going from the control to treatment condition. The baseline response rate is the expected rate without the treatment (i.e., untreated or control cases). For example, if you want to detect an increase of 25% over non-treated subjects, enter 1.25. Or if you want to detect a decrease of 20% over non-treated subjects, enter 0.80. Note that more drastic expected changes in response rate increase the likelihood of significant results and lower the required sample size.
α err prob = choose a Type I error rate (usually 0.05 or 0.01).
Power = select our level of power (usually 80% or higher).
Base rate Exp(Bo) = The base rate is the response rate we expect when there is no intervention. Finding the base rate is broken down into 3 steps.
1. The first step in selecting a baseline rate is to pick a unit of exposure. The unit of exposure is the window through which we will count events. Units of exposure may be time, distance, area, volume, sample size, and space. In healthcare studies the unit of exposure is often number of patient days.
2. The second step is to pick a size or length for the unit of exposure. If the unit of exposure is time, we may decide to use 30 days. If the unit of exposure is patient days, we may decide to use 100 patient days or 1000 patient days. (Note: this is not necessarily the duration of the study, just the denominator value for figuring a base rate.) Rare events may require longer units of exposure.
3. The third step is to choose the number of events that we expect per length of the unit of exposure (from step 2) for persons who are not exposed to the treatment. The number of expected events for the non-treated group is put over the length of unit of exposure and the ratio calculated. This ratio is the ‘Base rate Exp(Bo)’ value. For example, when the unit of exposure is days (step 1) and the length of the unit exposure is 30 days (step 2), if we think there are 13 events per 30 days in non-treated subjects, the baseline rate will be 13/30 = 0.433. If we think there are 195 events per 1000 patient days in non-treated subjects, the baseline rate will be 195/1000 = 0.195. (Note that the base rate can refer to death rate, survival rate, accident rate, or hazard rate etc.)
Mean exposure = enter a mean exposure value representing how long the study will last. Assuming the unit of exposure is days, consider the following scenario:
Exp[B1] = 1.25, representing a 25% rate increase over the base rate
α err prob = 0.05
Power = 0.80
Base rate (Exp[Bo]) = 13events/30days = 0.433
R^2 other X = 0
X distribution = Binomial
X parm π = 0.50
The following Mean Exposure settings show how sample size requirements change for the above scenario. (Note: the following mean exposure values produce similar required participant observation days.)
- If we want the required sample size for a study lasting one day, enter 1. We will need 1028 participants.
(1day x 1028 = 1028 participant observation days)
- If we want the required sample size for a study lasting 4 days, enter 4. We will need 257 participants.
(4days x 257 = 1028 participant observation days)
- If we want the required sample size for a study lasting 10 days, enter 10. We will need 103 participants.
(10days x 103 = 1030 participant observation days)
- And if we want the required sample size for a study lasting 28 days, enter 28. We will need 37 participants.
(28days x 37 = 1036 participant observation days)
R^2 other X = enter the expected squared multiple correlation (R-squared) between the main dichotomous predictor variable and other covariates . If there are no covariates, enter 0. If there are covariates and they are expected to have a low association with the main predictor (say R= 0.20), enter 0.20^2 = 0.04. If the covariates are expected to have a moderate association with the main predictor (say R = 0.50), enter 0.50^2 = 0.25. If the covariates are expected to have a strong association with the main predictor (say R = 0.90), enter 0.90^2 = 0.81, etc.
X Distribution = usually binomial when the predictor is dichotomous. This represents the shape of the probability distribution for the main predictor variable.
X parm π = the proportion of cases belonging to the treatment group. If sample sizes are equal, enter 0.50. If 75% of the cases are in the treatment group, enter 0.75. If the number of cases in the treatment group will be 45% of the total, enter 0.45, etc.
Click here for video.
(Find out whether a dichotomous predictor variable influences the rate of events over a set period of time, with or without covariates. Note that Poisson regression assumes independence of observations which is that the occurrence or absence of a previous event should not influence whether another event will occur. Subjects can have multiple events, as long as they are independent.)
Example: Does treatment (yes vs. no) increase a response rate? Or does treatment (yes vs. no) decrease a response rate?
Tails = 1 or 2
Exp(B1) = Enter the increase (or decrease) in the response rate beyond the base rate [Exp(Bo)] that you want to detect when going from the control to treatment condition. The baseline response rate is the expected rate without the treatment (i.e., untreated or control cases). For example, if you want to detect an increase of 25% over non-treated subjects, enter 1.25. Or if you want to detect a decrease of 20% over non-treated subjects, enter 0.80. Note that more drastic expected changes in response rate increase the likelihood of significant results and lower the required sample size.
α err prob = choose a Type I error rate (usually 0.05 or 0.01).
Power = select our level of power (usually 80% or higher).
Base rate Exp(Bo) = The base rate is the response rate we expect when there is no intervention. Finding the base rate is broken down into 3 steps.
1. The first step in selecting a baseline rate is to pick a unit of exposure. The unit of exposure is the window through which we will count events. Units of exposure may be time, distance, area, volume, sample size, and space. In healthcare studies the unit of exposure is often number of patient days.
2. The second step is to pick a size or length for the unit of exposure. If the unit of exposure is time, we may decide to use 30 days. If the unit of exposure is patient days, we may decide to use 100 patient days or 1000 patient days. (Note: this is not necessarily the duration of the study, just the denominator value for figuring a base rate.) Rare events may require longer units of exposure.
3. The third step is to choose the number of events that we expect per length of the unit of exposure (from step 2) for persons who are not exposed to the treatment. The number of expected events for the non-treated group is put over the length of unit of exposure and the ratio calculated. This ratio is the ‘Base rate Exp(Bo)’ value. For example, when the unit of exposure is days (step 1) and the length of the unit exposure is 30 days (step 2), if we think there are 13 events per 30 days in non-treated subjects, the baseline rate will be 13/30 = 0.433. If we think there are 195 events per 1000 patient days in non-treated subjects, the baseline rate will be 195/1000 = 0.195. (Note that the base rate can refer to death rate, survival rate, accident rate, or hazard rate etc.)
Mean exposure = enter a mean exposure value representing how long the study will last. Assuming the unit of exposure is days, consider the following scenario:
Exp[B1] = 1.25, representing a 25% rate increase over the base rate
α err prob = 0.05
Power = 0.80
Base rate (Exp[Bo]) = 13events/30days = 0.433
R^2 other X = 0
X distribution = Binomial
X parm π = 0.50
The following Mean Exposure settings show how sample size requirements change for the above scenario. (Note: the following mean exposure values produce similar required participant observation days.)
- If we want the required sample size for a study lasting one day, enter 1. We will need 1028 participants.
(1day x 1028 = 1028 participant observation days)
- If we want the required sample size for a study lasting 4 days, enter 4. We will need 257 participants.
(4days x 257 = 1028 participant observation days)
- If we want the required sample size for a study lasting 10 days, enter 10. We will need 103 participants.
(10days x 103 = 1030 participant observation days)
- And if we want the required sample size for a study lasting 28 days, enter 28. We will need 37 participants.
(28days x 37 = 1036 participant observation days)
R^2 other X = enter the expected squared multiple correlation (R-squared) between the main dichotomous predictor variable and other covariates . If there are no covariates, enter 0. If there are covariates and they are expected to have a low association with the main predictor (say R= 0.20), enter 0.20^2 = 0.04. If the covariates are expected to have a moderate association with the main predictor (say R = 0.50), enter 0.50^2 = 0.25. If the covariates are expected to have a strong association with the main predictor (say R = 0.90), enter 0.90^2 = 0.81, etc.
X Distribution = usually binomial when the predictor is dichotomous. This represents the shape of the probability distribution for the main predictor variable.
X parm π = the proportion of cases belonging to the treatment group. If sample sizes are equal, enter 0.50. If 75% of the cases are in the treatment group, enter 0.75. If the number of cases in the treatment group will be 45% of the total, enter 0.45, etc.
Click here for video.
31. Proportions: Difference between 2 independent proportions
(Compare 2 proportions from 2 independent groups)
Example: Is the proportion of divorces in Group I different from the proportion of divorces in Group II? The expected number of divorces in a sample of 100 people from Group I is 22/100 or 0.22. The expected number of divorces in a sample of 105 people from Group II is 42/105 or 0.40. How many participants are needed to detect a significant difference?
Tails = 1 or 2
Proportion 2 = enter Group II proportion, 0.40
Proportion 1 = enter Group I proportion, 0.22
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Allocation ratio n2/n1 = enter ratio of participants in both groups. For this example, N2/N1 -> 105/100 = 1.05. (Enter 1.0 if sample sizes are expected to be the same.)
Click here for video.
(Compare 2 proportions from 2 independent groups)
Example: Is the proportion of divorces in Group I different from the proportion of divorces in Group II? The expected number of divorces in a sample of 100 people from Group I is 22/100 or 0.22. The expected number of divorces in a sample of 105 people from Group II is 42/105 or 0.40. How many participants are needed to detect a significant difference?
Tails = 1 or 2
Proportion 2 = enter Group II proportion, 0.40
Proportion 1 = enter Group I proportion, 0.22
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Allocation ratio n2/n1 = enter ratio of participants in both groups. For this example, N2/N1 -> 105/100 = 1.05. (Enter 1.0 if sample sizes are expected to be the same.)
Click here for video.
32. Generic Z Test
F-Tests
Cohen’s univariate effect size conventions for “f”
f = 0.10 (small)
f = 0.25 (medium)
f = 0.40 (large)
Cohen’s univariate effect size conventions for “f”
f = 0.10 (small)
f = 0.25 (medium)
f = 0.40 (large)
33. ANCOVA: Fixed effects, main effects, and interactions
34. ANOVA: Fixed Effects, omnibus, one-way
(Test whether there is a significant difference between levels of a between [fixed effects] groups variable in a one-way ANOVA)
Example: Suppose we want to see if seating location in a classroom affects math test scores. A class is divided into 3 sections: front seats, middle seats, and back seats. Seat location is the between groups factor.
Determine Effect Size -> Select Procedure -> Effect size from means.
Number of groups = 3 representing three levels of the seat location grouping variable for this example
SD within each group = enter expected standard deviation. There is just one SD to enter because ANOVA assumes homogeneity of variance, so the SDs should be the same. For this example we’ll say SD = 3.5
Means = Enter expected mean test scores for students in each location. The expected mean test score for students in the front is 35. The expected mean score for students in the middle is 32. The expected mean score for students in the back is 29.
Size = Enter expected sample sizes for group. In this example we expect to have about 10 students in each seating section, so enter 10 for each group.
Equal n = A short cut for entering the same number of participants per group. Ignore this step if you entered group sizes above. For this example you could enter 10 and click.
Click “Calculate” and transfer to main window -> Close.
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of Groups = should already be calculated from effect size calculations. If it’s not there, enter 3 for this example.
Click here for video.
(Test whether there is a significant difference between levels of a between [fixed effects] groups variable in a one-way ANOVA)
Example: Suppose we want to see if seating location in a classroom affects math test scores. A class is divided into 3 sections: front seats, middle seats, and back seats. Seat location is the between groups factor.
Determine Effect Size -> Select Procedure -> Effect size from means.
Number of groups = 3 representing three levels of the seat location grouping variable for this example
SD within each group = enter expected standard deviation. There is just one SD to enter because ANOVA assumes homogeneity of variance, so the SDs should be the same. For this example we’ll say SD = 3.5
Means = Enter expected mean test scores for students in each location. The expected mean test score for students in the front is 35. The expected mean score for students in the middle is 32. The expected mean score for students in the back is 29.
Size = Enter expected sample sizes for group. In this example we expect to have about 10 students in each seating section, so enter 10 for each group.
Equal n = A short cut for entering the same number of participants per group. Ignore this step if you entered group sizes above. For this example you could enter 10 and click.
Click “Calculate” and transfer to main window -> Close.
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of Groups = should already be calculated from effect size calculations. If it’s not there, enter 3 for this example.
Click here for video.
35. ANOVA: Fixed effects, special, main effects and interactions
(2-way ANOVA with 2 predictor variables. This approach calculates power for predictor variable main effects and interactions)
Example: We want to see if there is a difference in student test scores based on gender (female vs. male) and seating location in class (front, middle, back).
Determine Effect Size f = click “Determine” -> Select Procedure -> Direct.
Partial (n2) eta squared = estimate the total variance in the outcome variable (e.g., test scores) accounted for by a predictor variable (e.g., gender or seating location) or the interaction between predictor variables. Approximate partial n2 (eta squared) conventions: small = 0.02; medium = 0.06; large = 0.14.
Click “Calculate effect size” and transfer to main window -> click “Close”
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Numerator df = this is where you specify the effect you want to power. You can calculate power for a predictor variable main effect or an interaction. The “numerator df” value is found by taking the number of levels for the predictor variable or interaction that you want to power and subtracting one. For example,
- For powering the predictor variable seating, enter 3 (locations) – 1 = 2 df
- For powering the predictor variable gender, enter 2 (genders) – 1 = 1 df
- For powering the interaction, enter (3 – 1) * (2 – 1) = 2 df
Number of groups = found by multiplying the number of levels in both predictor variables. In this example there are 2 genders and 3 seating locations, so 2 x 3 = 6
Click here for video.
(2-way ANOVA with 2 predictor variables. This approach calculates power for predictor variable main effects and interactions)
Example: We want to see if there is a difference in student test scores based on gender (female vs. male) and seating location in class (front, middle, back).
Determine Effect Size f = click “Determine” -> Select Procedure -> Direct.
Partial (n2) eta squared = estimate the total variance in the outcome variable (e.g., test scores) accounted for by a predictor variable (e.g., gender or seating location) or the interaction between predictor variables. Approximate partial n2 (eta squared) conventions: small = 0.02; medium = 0.06; large = 0.14.
Click “Calculate effect size” and transfer to main window -> click “Close”
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Numerator df = this is where you specify the effect you want to power. You can calculate power for a predictor variable main effect or an interaction. The “numerator df” value is found by taking the number of levels for the predictor variable or interaction that you want to power and subtracting one. For example,
- For powering the predictor variable seating, enter 3 (locations) – 1 = 2 df
- For powering the predictor variable gender, enter 2 (genders) – 1 = 1 df
- For powering the interaction, enter (3 – 1) * (2 – 1) = 2 df
Number of groups = found by multiplying the number of levels in both predictor variables. In this example there are 2 genders and 3 seating locations, so 2 x 3 = 6
Click here for video.
36. ANOVA: Repeated measures, between factors
(Comparing levels of a between groups factor in a repeated measures ANOVA)
Example: 20 patients in a drug trial are going to have their blood tested at 1, 2, and 3 weeks. 10 patients in the treatment group will receive the drug. 10 patients in the control group will receive a placebo. What is the statistical power for detecting a significant difference in blood tests results between the treatment and control groups?
(Statistical Note. The question of what is being tested between groups when there are repeated measures sometimes arises. When testing a between subjects variable, programs usually sum the outcome variable across all repeated measures for each participant then divide the sum by the square root of the number of repeated measures. For the current example each patient’s score would be calculated by the following: (test1 + test2 + test3)/√3. The program then compares these scores using the grouping variable (i.e., treatment vs. control). This means that when there is a significant effect for the grouping variable, it is basically significant across all repeated measures.)
Determine Effect Size = Select procedure -> Effect size from means.
Number of Groups = number of groups created by the between subjects variable. In this example there are 2 groups (treatment and control).
SD within each group = enter expected SD for each group created by the between subjects variable. ANOVA assumes homogeneity of variability which means that the SD should be similar for all groups, so there is just one SD to enter. For the current example we’ll say that SD = 3.5.
Mean = Enter expected mean of test scores across all repeated measures for each group. In the current example we’ll say that the treatment group has a mean of 35 across all 3 time periods, and the control group has a mean of 31 across all 3 time periods.
Size = Enter expected sample sizes for each group. For the current example the treatment and control groups are each expected to have a sample size of 10.
Equal n = A short cut for entering the same number of participants per group. If you entered group sample sizes above you do not need to do this step. If you want to use this short cut, enter sample size for each group and click (10 for this example).
Click “calculate effect size” and transfer to main window.
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of Groups = Number of groups created by the between groups variable. In this example there are 2 groups (treatment and control)
Number of Measurements = Number of repeated measures. In this example there are 3 repeated measures (time1, time 2, time3)
Correlation among repeated measures = enter approximate correlation among repeated measures. Note that in many cases this value should be moderate to high, so you may want to use the default value 0.50. Scores across repeated measures should be fairly consistent relative to other participants’ scores. In other words, participants who score low on the first measure are likely to score low on the second measure relative to participants who scored higher on the first and second measures. For the current example we’ll use the default value 0.50.
Click here for video
(Comparing levels of a between groups factor in a repeated measures ANOVA)
Example: 20 patients in a drug trial are going to have their blood tested at 1, 2, and 3 weeks. 10 patients in the treatment group will receive the drug. 10 patients in the control group will receive a placebo. What is the statistical power for detecting a significant difference in blood tests results between the treatment and control groups?
(Statistical Note. The question of what is being tested between groups when there are repeated measures sometimes arises. When testing a between subjects variable, programs usually sum the outcome variable across all repeated measures for each participant then divide the sum by the square root of the number of repeated measures. For the current example each patient’s score would be calculated by the following: (test1 + test2 + test3)/√3. The program then compares these scores using the grouping variable (i.e., treatment vs. control). This means that when there is a significant effect for the grouping variable, it is basically significant across all repeated measures.)
Determine Effect Size = Select procedure -> Effect size from means.
Number of Groups = number of groups created by the between subjects variable. In this example there are 2 groups (treatment and control).
SD within each group = enter expected SD for each group created by the between subjects variable. ANOVA assumes homogeneity of variability which means that the SD should be similar for all groups, so there is just one SD to enter. For the current example we’ll say that SD = 3.5.
Mean = Enter expected mean of test scores across all repeated measures for each group. In the current example we’ll say that the treatment group has a mean of 35 across all 3 time periods, and the control group has a mean of 31 across all 3 time periods.
Size = Enter expected sample sizes for each group. For the current example the treatment and control groups are each expected to have a sample size of 10.
Equal n = A short cut for entering the same number of participants per group. If you entered group sample sizes above you do not need to do this step. If you want to use this short cut, enter sample size for each group and click (10 for this example).
Click “calculate effect size” and transfer to main window.
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of Groups = Number of groups created by the between groups variable. In this example there are 2 groups (treatment and control)
Number of Measurements = Number of repeated measures. In this example there are 3 repeated measures (time1, time 2, time3)
Correlation among repeated measures = enter approximate correlation among repeated measures. Note that in many cases this value should be moderate to high, so you may want to use the default value 0.50. Scores across repeated measures should be fairly consistent relative to other participants’ scores. In other words, participants who score low on the first measure are likely to score low on the second measure relative to participants who scored higher on the first and second measures. For the current example we’ll use the default value 0.50.
Click here for video
37. ANOVA: Repeated measures, within factors
(Compare levels of a within groups variable in a repeated measures ANOVA)
Example: 20 patients in a drug trial are going to have their blood tested at 1, 2, and 3 weeks. 10 patients in the treatment group will receive the drug. 10 patients in the control group will receive a placebo. Time of testing (1, 2, and 3 weeks) is the within variable. What is the statistical power for detecting a significant difference in blood tests results across time (1, 2, and 3 weeks)?
Determine Effect Size = Select Procedure -> Direct method. Enter partial eta squared (n2) which is the effect size measure indicating the total variance in testing explained by the within subjects variable (e.g., time of testing). Approximate eta squared size conventions are small = 0.02, medium = 0.06, large = 0.14. In this example the predictor variable is treatment (drug or placebo).
Click “Calculate and transfer to main window”
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of groups = In this example there are 2 treatment levels (drug vs. placebo). If there is no between subjects factor, enter 1 group.
Number of measurements = number of repeated measures. In this example there are 3 time periods.
Correlation among repeated measures = enter approximate correlation among columns of repeated measures. Note that this value should be somewhat high because patients’ blood test results should be fairly consistent across time relative to other patients’ test results. In this example we’ll use the default value of 0.50.
Nonsphericity correction e = 1.0 if sphericity assumption is met, something else if not met. (Highest value is 1.0, and lowest value = 1/[repetitions – 1].)
What is the Sphericity Assumption? When running repeated measures the variances of the differences between all possible pairs of the within subjects variable should be equivalent. For example, if an outcome variable is measured at time1, time2, and time3, the variances of the differences between time1 - time2, time1 - time3, and time2 - time3 should be roughly the same.
What is the "Nonsphericity correction E" value GPower asks for? This refers to how well the sphericity assumption is met in your data. Of course this can only be determined with any degree of accuracy if you have preliminary data. If preliminary data are not available and you want to assume that the sphericity assumption will be met, enter 1.0, the default value. If you suspect that your data will deviate from perfect sphericity, enter a value less than 1.0. The more the "nonsphericity correction" drops below 1.0, the more you expect your data to deviate from perfect sphericity. The lowest value GPower allows is 1 / (number of repetitions - 1). One convention holds that a "nonsphericity correction" value should be at least 0.75 or higher in repeated measures ANOVA. If you expect your data to deviate substantially from the sphericity assumption, you may want to use a multivariate method which does not require the sphericity assumption.
What does the “Nonsphericity correction E" value do? When a “nonsphericity correction” value < 1.0 is entered it adjusts the degrees of freedom in the repeated measures ANOVA, resulting in a larger critical value for the test. Larger critical values correct for the lowering of critical values that occurs when the sphericity assumption is not met. In other words, a “nonsphericity correction” value < 1.0 balances out the likelihood of a Type I error that goes up when sphericity is not met. Note that a “nonsphericity correction” value < 1.0 will increase the sample size requirement because it raises the critical cutoff value.
Click here for video.
(Compare levels of a within groups variable in a repeated measures ANOVA)
Example: 20 patients in a drug trial are going to have their blood tested at 1, 2, and 3 weeks. 10 patients in the treatment group will receive the drug. 10 patients in the control group will receive a placebo. Time of testing (1, 2, and 3 weeks) is the within variable. What is the statistical power for detecting a significant difference in blood tests results across time (1, 2, and 3 weeks)?
Determine Effect Size = Select Procedure -> Direct method. Enter partial eta squared (n2) which is the effect size measure indicating the total variance in testing explained by the within subjects variable (e.g., time of testing). Approximate eta squared size conventions are small = 0.02, medium = 0.06, large = 0.14. In this example the predictor variable is treatment (drug or placebo).
Click “Calculate and transfer to main window”
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of groups = In this example there are 2 treatment levels (drug vs. placebo). If there is no between subjects factor, enter 1 group.
Number of measurements = number of repeated measures. In this example there are 3 time periods.
Correlation among repeated measures = enter approximate correlation among columns of repeated measures. Note that this value should be somewhat high because patients’ blood test results should be fairly consistent across time relative to other patients’ test results. In this example we’ll use the default value of 0.50.
Nonsphericity correction e = 1.0 if sphericity assumption is met, something else if not met. (Highest value is 1.0, and lowest value = 1/[repetitions – 1].)
What is the Sphericity Assumption? When running repeated measures the variances of the differences between all possible pairs of the within subjects variable should be equivalent. For example, if an outcome variable is measured at time1, time2, and time3, the variances of the differences between time1 - time2, time1 - time3, and time2 - time3 should be roughly the same.
What is the "Nonsphericity correction E" value GPower asks for? This refers to how well the sphericity assumption is met in your data. Of course this can only be determined with any degree of accuracy if you have preliminary data. If preliminary data are not available and you want to assume that the sphericity assumption will be met, enter 1.0, the default value. If you suspect that your data will deviate from perfect sphericity, enter a value less than 1.0. The more the "nonsphericity correction" drops below 1.0, the more you expect your data to deviate from perfect sphericity. The lowest value GPower allows is 1 / (number of repetitions - 1). One convention holds that a "nonsphericity correction" value should be at least 0.75 or higher in repeated measures ANOVA. If you expect your data to deviate substantially from the sphericity assumption, you may want to use a multivariate method which does not require the sphericity assumption.
What does the “Nonsphericity correction E" value do? When a “nonsphericity correction” value < 1.0 is entered it adjusts the degrees of freedom in the repeated measures ANOVA, resulting in a larger critical value for the test. Larger critical values correct for the lowering of critical values that occurs when the sphericity assumption is not met. In other words, a “nonsphericity correction” value < 1.0 balances out the likelihood of a Type I error that goes up when sphericity is not met. Note that a “nonsphericity correction” value < 1.0 will increase the sample size requirement because it raises the critical cutoff value.
Click here for video.
38. ANOVA: Repeated measures, within-between interaction
(Repeated measures ANOVA [RMANOVA] for testing the interaction between a within subjects variable and a between subjects variable)
Example: Plan to enroll 16 people (8 women and 8 men) in a study testing effectiveness of a weight loss supplement. Participants’ weights will be measured at 1 month, 2 months, and 3 months. Time is the within subjects variable and gender is the between subjects variable. How many participants are needed to detect a significant interaction between the time variable and gender variable?
Determine Effect Size = Select Procedure -> direct method. Partial eta squared (n2) is the effect size measure for the interaction between the within and between subjects variables. For this example, enter the amount of variability in the outcome that is accounted for by the interaction between gender and time. Approximate partial eta squared conventions are small = .02; medium = .06; large = 0.14.
Click “calculate effect size” and transfer to main window.
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of groups = levels of the between subjects factor. In this example there are 2 genders.
Number of measurements = number of repeated measures. In this example there are 3 time periods.
Correlation among repeated measures = enter approximate correlation among columns of repeated measures. Note that this value should be somewhat high because patients’ blood test results should be fairly consistent across time relative to other patients’ test results. In this example we’ll use the default value of 0.50.
Nonsphericity correction e = 1.0 if sphericity assumption is met, something else if not met. (Highest value is 1.0, and lowest value = 1/[repetitions – 1].) In the current example we’ll assume this assumption is met (use default value of 1).
What is the Sphericity Assumption? When running repeated measures the variances of the differences between all possible pairs of the within subjects variable should be equivalent. For example, if an outcome variable is measured at time1, time2, and time3, the variances of the differences between time1 - time2, time1 - time3, and time2 - time3 should be roughly the same.
What is the "Nonsphericity correction E" value GPower asks for? This refers to how well the sphericity assumption is met in your data. Of course this can only be determined with any degree of accuracy if you have preliminary data. If preliminary data are not available and you want to assume that the sphericity assumption will be met, enter 1.0, the default value. If you suspect that your data will deviate from perfect sphericity, enter a value less than 1.0. The more the "nonsphericity correction" drops below 1.0, the more you expect your data to deviate from perfect sphericity. The lowest value GPower allows is 1 / (number of repetitions - 1). One convention holds that a "nonsphericity correction" value should be at least 0.75 or higher in repeated measures ANOVA. If you expect your data to deviate substantially from the sphericity assumption, you may want to use a multivariate method which does not require the sphericity assumption.
What does the “Nonsphericity correction E" value do? When a “nonsphericity correction” value < 1.0 is entered it adjusts the degrees of freedom in the repeated measures ANOVA, resulting in a larger critical value for the test. Larger critical values correct for the lowering of critical values that occurs when the sphericity assumption is not met. In other words, a “nonsphericity correction” value < 1.0 balances out the likelihood of a Type I error that goes up when sphericity is not met. Note that a “nonsphericity correction” value < 1.0 will increase the sample size requirement because it raises the critical cutoff value.
Click here for video.
(Repeated measures ANOVA [RMANOVA] for testing the interaction between a within subjects variable and a between subjects variable)
Example: Plan to enroll 16 people (8 women and 8 men) in a study testing effectiveness of a weight loss supplement. Participants’ weights will be measured at 1 month, 2 months, and 3 months. Time is the within subjects variable and gender is the between subjects variable. How many participants are needed to detect a significant interaction between the time variable and gender variable?
Determine Effect Size = Select Procedure -> direct method. Partial eta squared (n2) is the effect size measure for the interaction between the within and between subjects variables. For this example, enter the amount of variability in the outcome that is accounted for by the interaction between gender and time. Approximate partial eta squared conventions are small = .02; medium = .06; large = 0.14.
Click “calculate effect size” and transfer to main window.
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of groups = levels of the between subjects factor. In this example there are 2 genders.
Number of measurements = number of repeated measures. In this example there are 3 time periods.
Correlation among repeated measures = enter approximate correlation among columns of repeated measures. Note that this value should be somewhat high because patients’ blood test results should be fairly consistent across time relative to other patients’ test results. In this example we’ll use the default value of 0.50.
Nonsphericity correction e = 1.0 if sphericity assumption is met, something else if not met. (Highest value is 1.0, and lowest value = 1/[repetitions – 1].) In the current example we’ll assume this assumption is met (use default value of 1).
What is the Sphericity Assumption? When running repeated measures the variances of the differences between all possible pairs of the within subjects variable should be equivalent. For example, if an outcome variable is measured at time1, time2, and time3, the variances of the differences between time1 - time2, time1 - time3, and time2 - time3 should be roughly the same.
What is the "Nonsphericity correction E" value GPower asks for? This refers to how well the sphericity assumption is met in your data. Of course this can only be determined with any degree of accuracy if you have preliminary data. If preliminary data are not available and you want to assume that the sphericity assumption will be met, enter 1.0, the default value. If you suspect that your data will deviate from perfect sphericity, enter a value less than 1.0. The more the "nonsphericity correction" drops below 1.0, the more you expect your data to deviate from perfect sphericity. The lowest value GPower allows is 1 / (number of repetitions - 1). One convention holds that a "nonsphericity correction" value should be at least 0.75 or higher in repeated measures ANOVA. If you expect your data to deviate substantially from the sphericity assumption, you may want to use a multivariate method which does not require the sphericity assumption.
What does the “Nonsphericity correction E" value do? When a “nonsphericity correction” value < 1.0 is entered it adjusts the degrees of freedom in the repeated measures ANOVA, resulting in a larger critical value for the test. Larger critical values correct for the lowering of critical values that occurs when the sphericity assumption is not met. In other words, a “nonsphericity correction” value < 1.0 balances out the likelihood of a Type I error that goes up when sphericity is not met. Note that a “nonsphericity correction” value < 1.0 will increase the sample size requirement because it raises the critical cutoff value.
Click here for video.
39. Hotelling’s T2: One group mean vector (aka, within group Hotelling's T-squared)
(Multivariate analysis for comparing 2 or more outcome variables in the same group of participants. Just 2 measurement time periods)
Example: In this example we’ll evaluate the effect of an intervention using a pre and post measures design. There are 2 outcome variables for measuring the effect of the intervention, outcome variable one (Y1) and outcome variable two (Y2). We will measure Y1 and Y2 before the intervention and measure Y1 and Y2 after the intervention in the same group of participants. Note that we could have more than 2 outcome variables.
Determine (Effect Size)
Number of response variables = 2 for this example
Input method = variance-covariance matrix
Specify/edit input values
Responses Y = 2 (there are two outcome variables in this example)
groups/means matrix = enter the expected average of the differences between pre and post scores for Y1 and Y2. If we expect the pre scores to be higher than the post scores, the average of the differences should be positive. If we expect the post scores to be higher, the average of the differences should be negative. The greater the distance from zero, the greater the effect.
Note that in this example we expect the pre scores to be higher than the post scores, so the mean of the differences should be positive. The following data are an example of what we expect 4 cases from the real dataset to look like.
Pre-intervention Post intervention Differences
preY1 preY2 postY1 postY2 Diff.Y1 Diff.Y2
5 8 4 8 5-4 = 1 8-8 = 0
4 9 4 7 4-4 = 0 9-7 = 2
5 7 6 6 5-6 = -1 7-6 = 1
6 8 5 7 6-5 = 1 8-7 = 1
Average of the Differences: 0.25 1.0 *Put these 2 values into the groups/means matrix.
Click the Cov Sigma tab to open the Cov Sigma matrix.
Make sure Covariances is selected in the drop down menu.
Calculate the variances and covariance for Diff.Y1 and Diff.Y2.
For this example,
Diff.Y1 variance = 0.92
Diff.Y2 variance = 0.67
Diff.Y1 and Diff.Y2 covariance = - 0.33
Enter these values into the covariance matrix so it looks like this:
Y1 Y2
Y1 .92 -.33
Y2 -.33 .67
(Note: If you chose the SD and correlation input option in the determine effect size window, you will be prompted to enter the standard deviations (SD) for Diff.Y1 and Diff.Y2, and the correlation between Diff.Y1 and Diff.Y2 in the Cov Sigma matrix.)
Click Okay to close the window.
In most cases multiply all means by should probably be set to 1.
Click Calculate and transfer effect size to main window, then close window.
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Response Variables = 2 for number of outcome variables in this example.
Click here for video.
(Multivariate analysis for comparing 2 or more outcome variables in the same group of participants. Just 2 measurement time periods)
Example: In this example we’ll evaluate the effect of an intervention using a pre and post measures design. There are 2 outcome variables for measuring the effect of the intervention, outcome variable one (Y1) and outcome variable two (Y2). We will measure Y1 and Y2 before the intervention and measure Y1 and Y2 after the intervention in the same group of participants. Note that we could have more than 2 outcome variables.
Determine (Effect Size)
Number of response variables = 2 for this example
Input method = variance-covariance matrix
Specify/edit input values
Responses Y = 2 (there are two outcome variables in this example)
groups/means matrix = enter the expected average of the differences between pre and post scores for Y1 and Y2. If we expect the pre scores to be higher than the post scores, the average of the differences should be positive. If we expect the post scores to be higher, the average of the differences should be negative. The greater the distance from zero, the greater the effect.
Note that in this example we expect the pre scores to be higher than the post scores, so the mean of the differences should be positive. The following data are an example of what we expect 4 cases from the real dataset to look like.
Pre-intervention Post intervention Differences
preY1 preY2 postY1 postY2 Diff.Y1 Diff.Y2
5 8 4 8 5-4 = 1 8-8 = 0
4 9 4 7 4-4 = 0 9-7 = 2
5 7 6 6 5-6 = -1 7-6 = 1
6 8 5 7 6-5 = 1 8-7 = 1
Average of the Differences: 0.25 1.0 *Put these 2 values into the groups/means matrix.
Click the Cov Sigma tab to open the Cov Sigma matrix.
Make sure Covariances is selected in the drop down menu.
Calculate the variances and covariance for Diff.Y1 and Diff.Y2.
For this example,
Diff.Y1 variance = 0.92
Diff.Y2 variance = 0.67
Diff.Y1 and Diff.Y2 covariance = - 0.33
Enter these values into the covariance matrix so it looks like this:
Y1 Y2
Y1 .92 -.33
Y2 -.33 .67
(Note: If you chose the SD and correlation input option in the determine effect size window, you will be prompted to enter the standard deviations (SD) for Diff.Y1 and Diff.Y2, and the correlation between Diff.Y1 and Diff.Y2 in the Cov Sigma matrix.)
Click Okay to close the window.
In most cases multiply all means by should probably be set to 1.
Click Calculate and transfer effect size to main window, then close window.
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Response Variables = 2 for number of outcome variables in this example.
Click here for video.
40. Hotelling’s T2: Two group mean vectors (aka, between groups Hotelling's T-squared)
(Multivariate analysis for comparing 2 independent groups where there are 2 or more outcome variables.)
Example: Let’s say we want to compare 2 groups of participants (treatment group vs. control group) based on 2 outcome variables Y1 and Y2. You may have more than 2 outcome variables.
Click Determine Effect Size.
Number of response variables = 2 outcome variables in this example
Input method = variance-covariance matrix
Click Specify/edit input values
groups/means matrix = This matrix contains expected means of each data column (see below). Let’s say we expect the data to be similar to these 3 participants.
E.G., Treatment Group Control Group
Y1 Y2 Y1 Y2
3 6 6 7
2 4 4 8
4 2 5 6
Means: 3 4 5 7
Variances: 1 4 1 1
Covariance: -1 -0.5
The group\means matrix will be . . . Y1 Y2
Group1 (tmt) 3 4
Group2 (control) 5 7
Click Cov Sigma tab. Make sure Covariances is selected in the drop down menu.
Calculate pooled variances and covariance and put these values in the matrix.
First, find pooled variance for Y1 using the following formula. (You find the pooled variances for Y2 using the same formula. Just substitute Y1 values for Y2 values).
(((n1 - 1)*tmt.varY1) + ((n2 -1)*control.varY1)) / (n1 + n2 - 2)
Where: n1 = number in tmt group1 Y1 (3)
n2 = number in control group2 Y1 (3)
tmt.varY1 = variance of tmt group Y1 (1)
control.varY1 = variance of control group Y1 (1)
So pooled variance for Y1 = (((3 - 1)*1) + ((3 - 1)*1)) / (3 + 3 - 2) = 1
And pooled variance for Y2 = (((3 - 1)*4) + ((3 - 1)*1)) / (3 + 3 - 2) = 2.5
Second, find pooled covariance for Y1 and Y2 using the following formula.
(((n1 - 1)*tmt.varY1Y2) + ((n2 -1)*control.varY1Y2)) / (n1 + n2 - 2)
Where: n1 = number in tmt group1 (3)
n2 = number in control group2 (3)
tmt.varY1Y2 = covariance for tmt group Y1 and Y2 scores (-1)
control.varY1Y2 = covariance for control group Y1 and Y2 scores (-0.5)
So pooled covariance for Y1 & Y2 = (((3 - 1)*3-1) + ((3 - 1)*-0.5))/(3 + 3 - 2) = -0.75
The Covariances matrix under the Cov Sigma tab will be (taken from the underlined, bold values above) . . .
Y1 Y2
Y1 1.00 -.750
Y2 -.750 2.50
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Allocation ratio N2/N1 = Expected sample size ratio. Sample size for Group 2 (control) divided by sample size for Group 1 (treatment). Enter 1 if sample sizes are expected to be the same
Response variables = enter number of outcome variables, in this example 2
Click here for video.
(Multivariate analysis for comparing 2 independent groups where there are 2 or more outcome variables.)
Example: Let’s say we want to compare 2 groups of participants (treatment group vs. control group) based on 2 outcome variables Y1 and Y2. You may have more than 2 outcome variables.
Click Determine Effect Size.
Number of response variables = 2 outcome variables in this example
Input method = variance-covariance matrix
Click Specify/edit input values
groups/means matrix = This matrix contains expected means of each data column (see below). Let’s say we expect the data to be similar to these 3 participants.
E.G., Treatment Group Control Group
Y1 Y2 Y1 Y2
3 6 6 7
2 4 4 8
4 2 5 6
Means: 3 4 5 7
Variances: 1 4 1 1
Covariance: -1 -0.5
The group\means matrix will be . . . Y1 Y2
Group1 (tmt) 3 4
Group2 (control) 5 7
Click Cov Sigma tab. Make sure Covariances is selected in the drop down menu.
Calculate pooled variances and covariance and put these values in the matrix.
First, find pooled variance for Y1 using the following formula. (You find the pooled variances for Y2 using the same formula. Just substitute Y1 values for Y2 values).
(((n1 - 1)*tmt.varY1) + ((n2 -1)*control.varY1)) / (n1 + n2 - 2)
Where: n1 = number in tmt group1 Y1 (3)
n2 = number in control group2 Y1 (3)
tmt.varY1 = variance of tmt group Y1 (1)
control.varY1 = variance of control group Y1 (1)
So pooled variance for Y1 = (((3 - 1)*1) + ((3 - 1)*1)) / (3 + 3 - 2) = 1
And pooled variance for Y2 = (((3 - 1)*4) + ((3 - 1)*1)) / (3 + 3 - 2) = 2.5
Second, find pooled covariance for Y1 and Y2 using the following formula.
(((n1 - 1)*tmt.varY1Y2) + ((n2 -1)*control.varY1Y2)) / (n1 + n2 - 2)
Where: n1 = number in tmt group1 (3)
n2 = number in control group2 (3)
tmt.varY1Y2 = covariance for tmt group Y1 and Y2 scores (-1)
control.varY1Y2 = covariance for control group Y1 and Y2 scores (-0.5)
So pooled covariance for Y1 & Y2 = (((3 - 1)*3-1) + ((3 - 1)*-0.5))/(3 + 3 - 2) = -0.75
The Covariances matrix under the Cov Sigma tab will be (taken from the underlined, bold values above) . . .
Y1 Y2
Y1 1.00 -.750
Y2 -.750 2.50
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Allocation ratio N2/N1 = Expected sample size ratio. Sample size for Group 2 (control) divided by sample size for Group 1 (treatment). Enter 1 if sample sizes are expected to be the same
Response variables = enter number of outcome variables, in this example 2
Click here for video.
41. MANOVA – Global Effects
(Multivariate analysis for determining the effect of a single, between subjects [grouping] variable when there are 2 or more outcome variables.)
Example: We’ll assume that there are 4 different therapy groups (therapy A, B, C, D), and there are 5 outcome variables (Y1, Y2, Y3, Y4, Y5). What sample size is required to detect a significant effect of patient therapy?
Click Determine Effect Size (f2 Effect Size Conventions: Small = [0.10]^2 = .01; Medium = [0.25]^2 = 0.06; Large = [0.40]^2 = 0.16)
Pillai’s V = estimate this value by running a MANOVA on preliminary data. If preliminary data is not available, generate a preliminary dataset using random sampling based on estimated parameter values (e.g., mean and SD) then run a MANOVA.
Number of groups = 4 therapy groups for this example
Response variables = 5 outcome variables for this example
Click Calculate and transfer to main window
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of groups = number of levels of the between subjects variable. In this example there are 4 therapy groups
Response variables = number of outcome variables. In this example there are 5
(Multivariate analysis for determining the effect of a single, between subjects [grouping] variable when there are 2 or more outcome variables.)
Example: We’ll assume that there are 4 different therapy groups (therapy A, B, C, D), and there are 5 outcome variables (Y1, Y2, Y3, Y4, Y5). What sample size is required to detect a significant effect of patient therapy?
Click Determine Effect Size (f2 Effect Size Conventions: Small = [0.10]^2 = .01; Medium = [0.25]^2 = 0.06; Large = [0.40]^2 = 0.16)
Pillai’s V = estimate this value by running a MANOVA on preliminary data. If preliminary data is not available, generate a preliminary dataset using random sampling based on estimated parameter values (e.g., mean and SD) then run a MANOVA.
Number of groups = 4 therapy groups for this example
Response variables = 5 outcome variables for this example
Click Calculate and transfer to main window
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of groups = number of levels of the between subjects variable. In this example there are 4 therapy groups
Response variables = number of outcome variables. In this example there are 5
42. MANOVA – Special Effects and Interactions
(Multivariate analysis for determining the significance of the interaction between 2 or more between subjects [grouping] variables when there are 2 or more outcome variables.)
Example: We’ll assume that patients are receiving one of the following types of therapy (therapy A, B, C, D) and that there are 3 outcome variables (Y1, Y2, Y3). What sample size is required to detect a significant interaction between therapy and gender?
Click Determine Effect Size (f2 Effect Size Conventions: Small = [0.10]^2 = .01; Medium = [0.25]^2 = 0.06; Large = [0.40]^2 = 0.16)
Pillai’s V = estimate this value by running a MANOVA on preliminary data. If preliminary data is not available, generate a preliminary dataset using random sampling based on estimated parameter values (e.g., mean and SD) then run a MANOVA.
Number of predictors = number of between subjects, predictor variables. In the example there are 2 predictors (therapy and gender).
Response variables = 3 outcome variables for this example
Click Calculate and transfer to main window
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of Groups = number of groups formed by crossing the 2 between subjects variables. In this example, 4 therapies x 2 Genders = 8 groups
Number of predictors = 2 predictor variables in this example (therapy & gender)
Response Variables = number of outcome variables. 3 for this example (Y1, Y2, Y3)
(Multivariate analysis for determining the significance of the interaction between 2 or more between subjects [grouping] variables when there are 2 or more outcome variables.)
Example: We’ll assume that patients are receiving one of the following types of therapy (therapy A, B, C, D) and that there are 3 outcome variables (Y1, Y2, Y3). What sample size is required to detect a significant interaction between therapy and gender?
Click Determine Effect Size (f2 Effect Size Conventions: Small = [0.10]^2 = .01; Medium = [0.25]^2 = 0.06; Large = [0.40]^2 = 0.16)
Pillai’s V = estimate this value by running a MANOVA on preliminary data. If preliminary data is not available, generate a preliminary dataset using random sampling based on estimated parameter values (e.g., mean and SD) then run a MANOVA.
Number of predictors = number of between subjects, predictor variables. In the example there are 2 predictors (therapy and gender).
Response variables = 3 outcome variables for this example
Click Calculate and transfer to main window
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of Groups = number of groups formed by crossing the 2 between subjects variables. In this example, 4 therapies x 2 Genders = 8 groups
Number of predictors = 2 predictor variables in this example (therapy & gender)
Response Variables = number of outcome variables. 3 for this example (Y1, Y2, Y3)
43. MANOVA: Repeated Measures, Between Factors
Testing between factor effects in univariate RMANOVA using the multivariate approach, when sphericity assumption is not met.
44. MANOVA: Repeated Measures, Within Factors
Testing within factor effects in univariate RMANOVA using the multivariate approach, when sphericity assumption is not met.
45. MANOVA: Repeated Measures, Within-Between Interaction
Testing interaction of within and between factors in univariate RMANOVA using the multivariate approach, when sphericity assumption is not met.
Testing between factor effects in univariate RMANOVA using the multivariate approach, when sphericity assumption is not met.
44. MANOVA: Repeated Measures, Within Factors
Testing within factor effects in univariate RMANOVA using the multivariate approach, when sphericity assumption is not met.
45. MANOVA: Repeated Measures, Within-Between Interaction
Testing interaction of within and between factors in univariate RMANOVA using the multivariate approach, when sphericity assumption is not met.
46. Linear Multiple Regression: Fixed model, R2 deviation from zero
(Evaluate whether a group of predictors significantly predicts a continuous outcome variable. The null hypothesis is that the proportion of variance in the outcome explained by a set of predictors (R-squared) equals zero.)
Example: Do predictor variables P1, P2, and P3 significantly predict the outcome?
Click Determine effect size f2->
Two methods.
Method 1
Select From correlation coefficient
Squared multiple correlation p^2 = the multiple coefficient of determination (i.e., percentage of variability in the outcome variable that is explained by the predictor variables). Begin by estimating the strength of association between the predictor variables and the outcome variable. This strength of association is called the multiple correlation coefficient, R. Let’s say that in this example we expect an R value of 0.30. Next square the R to get the squared multiple correlation coefficient (R^2). For this example R^2 => 0.30^2 = 0.09. Enter 0.09.
Click Calculate Effect size f^2 and transfer to main window. (Note, G*power does the following calculation to get f^2: R^2/1-R^2)
Method 2 (probably a more rigorous approach)
Select From predictor correlations
Number of predictors = 3 for this example
Click Specify matrices.
Under the Corr between predictors and outcome tab enter the expected correlations between the predictor variables (e.g., P1, P2, P3) and the outcome variable Y. Let’s say we expect the following correlations between the outcome variable and the 3 predictor variables in this example.
P1 P2 P3
outcome Y .23 .16 .24
Click the Corr between predictors tab. Enter the expected correlations between the predictor variables. In this example there are 3 predictor variables so the matrix is 3x3. Let’s say that for this example the expected correlations between the 3 predictors variables are as follows.
P1 P2 P3
P1 1.0 .20 .45
P2 .20 1.0 .31
P3 .45 .31 1.0
Click Calc p^2
Click Accept values
Click Calculate Effect size f^2 and transfer to main window
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of Predictors = 3 predictor variables for this example
Click here for video.
(Evaluate whether a group of predictors significantly predicts a continuous outcome variable. The null hypothesis is that the proportion of variance in the outcome explained by a set of predictors (R-squared) equals zero.)
Example: Do predictor variables P1, P2, and P3 significantly predict the outcome?
Click Determine effect size f2->
Two methods.
Method 1
Select From correlation coefficient
Squared multiple correlation p^2 = the multiple coefficient of determination (i.e., percentage of variability in the outcome variable that is explained by the predictor variables). Begin by estimating the strength of association between the predictor variables and the outcome variable. This strength of association is called the multiple correlation coefficient, R. Let’s say that in this example we expect an R value of 0.30. Next square the R to get the squared multiple correlation coefficient (R^2). For this example R^2 => 0.30^2 = 0.09. Enter 0.09.
Click Calculate Effect size f^2 and transfer to main window. (Note, G*power does the following calculation to get f^2: R^2/1-R^2)
Method 2 (probably a more rigorous approach)
Select From predictor correlations
Number of predictors = 3 for this example
Click Specify matrices.
Under the Corr between predictors and outcome tab enter the expected correlations between the predictor variables (e.g., P1, P2, P3) and the outcome variable Y. Let’s say we expect the following correlations between the outcome variable and the 3 predictor variables in this example.
P1 P2 P3
outcome Y .23 .16 .24
Click the Corr between predictors tab. Enter the expected correlations between the predictor variables. In this example there are 3 predictor variables so the matrix is 3x3. Let’s say that for this example the expected correlations between the 3 predictors variables are as follows.
P1 P2 P3
P1 1.0 .20 .45
P2 .20 1.0 .31
P3 .45 .31 1.0
Click Calc p^2
Click Accept values
Click Calculate Effect size f^2 and transfer to main window
α err prob = choose a Type I error rate (usually 0.05 or 0.01)
Power = select a level of statistical power (usually 80% or higher)
Number of Predictors = 3 predictor variables for this example
Click here for video.
47. Linear Multiple Regression: Fixed model, R2 increase
48. Variance: Test of equality (2 sample case)
49. Generic F Test
48. Variance: Test of equality (2 sample case)
49. Generic F Test
RPower (Power calculations that can be done in the R statistical platform. These are not available in GPower.)
50. Stepped Wedge
Background: In stepped wedge studies the treatment is introduced to different groups or participants (called 'clusters') at different time periods. Typically (though not all the time) a baseline measure is taken during time period 1. During the baseline period none of the clusters receives the treatment. As each time period passes, a new cluster receives the treatment until all clusters are receiving the treatment. The figure below shows a stepped wedge design for a study with 5 clusters and 6 time periods. Control time periods are shown in white. Treatment time periods are shown in purple. in this study no cluster receives treatment during time period 1. Cluster 1 starts treatment during time 2, cluster 2 starts treatment during time 3, and so on until cluster 5 starts treatment during time 6.
Background: In stepped wedge studies the treatment is introduced to different groups or participants (called 'clusters') at different time periods. Typically (though not all the time) a baseline measure is taken during time period 1. During the baseline period none of the clusters receives the treatment. As each time period passes, a new cluster receives the treatment until all clusters are receiving the treatment. The figure below shows a stepped wedge design for a study with 5 clusters and 6 time periods. Control time periods are shown in white. Treatment time periods are shown in purple. in this study no cluster receives treatment during time period 1. Cluster 1 starts treatment during time 2, cluster 2 starts treatment during time 3, and so on until cluster 5 starts treatment during time 6.
Which power calculation we use depends on the outcome variable.
- If your outcome is continuous normal, use method #50A or method #50B below.
- If your outcome is a proportion, use method 50C below.
50A. Power calculation using SWSamp package for a continuous normal outcome (see also #50B below)
This power calculation assumes that the outcome variable is continuous normal. Using the R 'lme4' package, the actual statistical analysis (not the power calculation) will be linear mixed modeling and look something like this:
> mymodel = lmer(outcome ~ treatment + factor(time) + (1 | cluster)
Having trouble finding SWSamp? Try getting it at https://sites.google.com/a/statistica.it/gianluca/swsamp
Example: We are planning a stepped wedge study with 5 clusters (groups) across 6 time periods. The first time period is a baseline phase where no group receives the treatment (see above figure). Our outcome variable is continuous normal. The treatment is expected to reduce outcome values (i.e., lower scores are better). We expect a mean outcome of 73 during baseline (1st time period), and expect a mean outcome of 65.7 during treatment phases. What is the power for this study?
Values for this power calculation
mu = 73 the expected mean during baseline phase
b.trt = -7.3 the average treatment effect (ATE). If the baseline mean is 73 and the treatment mean is 65.7, the ATE is treatment mean minus baseline mean (73 - 65.7 = -7.3)
sigma = 14 the expected residual standard deviation
I = 5 the number of clusters/groups
J = 5 the number of time points when at least one cluster is in a treatment phase (does not include baseline time point when all clusters are in baseline phase. 6 - 1 = 5 time points)
K = 10 the number of cases (observations) per cluster per time point
rho = 0.50 the estimated interclass coefficient (ICC)
R code
library(SWSamp)
mypower = HH.normal(mu = 73, b.trt = -7.3, sigma = 14, I = 5, J = 5, K = 10, rho = 0.5)
mypower
Result: There is approximately a 72% (0.716) chance of correctly rejecting the null hypothesis of no difference between control and treatment conditions. Same result as #50B below (see manual method)
50B. Manual power calculation in R for a continuous normal outcome. (Thanks to Eric Green for this code.)
(Same scenario as #50A)
This power calculation assumes that the outcome variable is continuous normal. Using the R 'lme4' package, the actual statistical analysis (not the power calculation) will be linear mixed modeling and look something like this:
> mymodel = lmer(outcome ~ treatment + factor(time) + (1 | cluster)
Having trouble finding SWSamp? Try getting it at https://sites.google.com/a/statistica.it/gianluca/swsamp
Example: We are planning a stepped wedge study with 5 clusters (groups) across 6 time periods. The first time period is a baseline phase where no group receives the treatment (see above figure). Our outcome variable is continuous normal. The treatment is expected to reduce outcome values (i.e., lower scores are better). We expect a mean outcome of 73 during baseline (1st time period), and expect a mean outcome of 65.7 during treatment phases. What is the power for this study?
R code (copy and pasteable into R)
# Step 1: specify study design
rounds = 6 # number of measurement time units
units = 5 # number of clusters/groups
XatT1 = 0 # will all groups receive intervention during first measurement unit (1=yes, 0=no)
units.XatTn = ifelse(XatT1==0, units/(rounds-1), units/rounds)
# Step 2: specify design matrix
d.mat <- data.frame(matrix(NA, nrow = units, ncol = rounds))
d.mat[1:rounds] <- 0
start.c <- ifelse(XatT1==0,2,1)
start.r <- 1
end.r <- start.r + (units.XatTn-1)
round.loop <- ifelse(XatT1==0,rounds-1,rounds)
l <- 1
# begin loop
while(l <= round.loop) {
d.mat[start.r:end.r, (start.c):rounds] <- 1
start.c <- start.c + 1
start.r <- end.r + 1
end.r <- end.r + units.XatTn
l <- l + 1
}
# Step 3: specify parameter estimates
cluster = 1 # do clusters represent groups =1 or individuals = 0?
m1 = 73 # expected mean time during baseline phase
m2 = 65.7 # expected mean time under treatment
s = 14 # estimated SD of time
N = units # number of units (5 units)
T = rounds # number of time periods (6 time periods)
n = 10 # expected number of observations/cases per group, per unit of time
n = ifelse(cluster==1, n, 1)
k = 0.20 # expected coefficient of variation for each clinic (probably should leave at 0.20)
k = ifelse(cluster==1, k, 0)
Za = 1.96 # 1.96 for alpha 0.05
# Step 4: Power Calculation using the Hussey and Hughes (2007) approach
round.sum <- rowSums(d.mat)
round.sum2 <- round.sum*round.sum
unit.sum <- colSums(d.mat)
unit.sum2 <- unit.sum*unit.sum
U <- sum(round.sum)
V <- sum(round.sum2)
W <- sum(unit.sum2)
t <- m1-m2
d2 <- (s*s)/n
z2 <- (m1*m1)*(k*k)
var.t1 <- (N*d2)*(d2+(T*z2))
var.t2 <- ((N*U)-W)*d2
var.t3 <- ((U*U)+(N*T*U)-(T*W)-(N*V))*z2
var.t <- var.t1/(var.t2+var.t3)
power1 <- sqrt((t*t)/var.t)
power2 <- power1-Za
mypower <- pnorm(power2, mean = 0, sd = 1)
mypower
Result: There is approximately a 72% (0.716) chance of correctly rejecting the null hypothesis of no difference between control and treatment conditions. Same result as #50A 'SWSamp' package method above.
50C. Manual power calculation using same study design as above (for proportions) (Thanks to Eric Green for this code.)
# Step 1: Study Design
rounds = 6 # number of measurement time units
units = 5 # number of clusters/groups
XatT1 = 0 # will all groups receive intervention during first measurement unit (1=yes, 0=no)
units.XatTn = ifelse(XatT1==0, units/(rounds-1), units/rounds)
# Step 2: Design Matrix
d.mat <- data.frame(matrix(NA, nrow = units, ncol = rounds))
d.mat[1:rounds] <- 0
start.c <- ifelse(XatT1==0,2,1)
start.r <- 1
end.r <- start.r + (units.XatTn-1)
round.loop <- ifelse(XatT1==0,rounds-1,rounds)
l <- 1
# begin loop
while(l <= round.loop) {
d.mat[start.r:end.r, (start.c):rounds] <- 1
start.c <- start.c + 1
start.r <- end.r + 1
end.r <- end.r + units.XatTn
l <- l + 1
}
# Step 3: Parameter Estimates
cluster = 1 # cluster-randomized=1; individual-randomized=0
p1 = 0.10 # proportion (outcome) in control group
p2 = 0.025 # proportion (outcome) in intervention group (reduced by 2 times)
N = units # do not modify: number of units
T = rounds # do not modify: number of time periods
n = 10 # observations per cluster (ignore if not CRT)
n = ifelse(cluster==1, n, 1) # do not modify
k = 0.20 # coefficient of variation (usually between 0.15 and 0.40)
k = ifelse(cluster==1, k, 0) # do not modify
Za = 1.96 # 1.96 for alpha = 0.05
# Step 4: Power Calculations
round.sum = rowSums(d.mat)
round.sum2 = round.sum*round.sum
unit.sum = colSums(d.mat)
unit.sum2 = unit.sum*unit.sum
U = sum(round.sum)
V = sum(round.sum2)
W = sum(unit.sum2)
t = p1-p2
d2 = (((p1+p2)/2)*(1-(p1+p2)/2))/n
z2 = (p1*p1)*(k*k)
var.t1 = (N*d2)*(d2+(T*z2))
var.t2 = ((N*U)-W)*d2
var.t3 = ((U*U)+(N*T*U)-(T*W)-(N*V))*z2
var.t = var.t1/(var.t2+var.t3)
power1 = sqrt((t*t)/var.t)
power2 = power1-Za
final.power = pnorm(power2, mean = 0, sd = 1)
# Step 1: Study Design
rounds = 6 # number of measurement time units
units = 5 # number of clusters/groups
XatT1 = 0 # will all groups receive intervention during first measurement unit (1=yes, 0=no)
units.XatTn = ifelse(XatT1==0, units/(rounds-1), units/rounds)
# Step 2: Design Matrix
d.mat <- data.frame(matrix(NA, nrow = units, ncol = rounds))
d.mat[1:rounds] <- 0
start.c <- ifelse(XatT1==0,2,1)
start.r <- 1
end.r <- start.r + (units.XatTn-1)
round.loop <- ifelse(XatT1==0,rounds-1,rounds)
l <- 1
# begin loop
while(l <= round.loop) {
d.mat[start.r:end.r, (start.c):rounds] <- 1
start.c <- start.c + 1
start.r <- end.r + 1
end.r <- end.r + units.XatTn
l <- l + 1
}
# Step 3: Parameter Estimates
cluster = 1 # cluster-randomized=1; individual-randomized=0
p1 = 0.10 # proportion (outcome) in control group
p2 = 0.025 # proportion (outcome) in intervention group (reduced by 2 times)
N = units # do not modify: number of units
T = rounds # do not modify: number of time periods
n = 10 # observations per cluster (ignore if not CRT)
n = ifelse(cluster==1, n, 1) # do not modify
k = 0.20 # coefficient of variation (usually between 0.15 and 0.40)
k = ifelse(cluster==1, k, 0) # do not modify
Za = 1.96 # 1.96 for alpha = 0.05
# Step 4: Power Calculations
round.sum = rowSums(d.mat)
round.sum2 = round.sum*round.sum
unit.sum = colSums(d.mat)
unit.sum2 = unit.sum*unit.sum
U = sum(round.sum)
V = sum(round.sum2)
W = sum(unit.sum2)
t = p1-p2
d2 = (((p1+p2)/2)*(1-(p1+p2)/2))/n
z2 = (p1*p1)*(k*k)
var.t1 = (N*d2)*(d2+(T*z2))
var.t2 = ((N*U)-W)*d2
var.t3 = ((U*U)+(N*T*U)-(T*W)-(N*V))*z2
var.t = var.t1/(var.t2+var.t3)
power1 = sqrt((t*t)/var.t)
power2 = power1-Za
final.power = pnorm(power2, mean = 0, sd = 1)
51. Equivalence Study (proportions)
(Evaluate whether a treatment is at least as effective as another treatment. Commonly used in healthcare to determine if a new therapy is as effective as a standard care therapy already in use.)
Example 1: There is a 3.8% rate of infection in patients receiving a standard of care treatment. Is a new, much less costly treatment just as effective as the expensive standard of care treatment. The rate of infection in patients receiving the new treatment is estimated to be around 4.0% (higher than 3.8% but may be worth the reduction in cost). How many patients are needed to show that the new treatment is at least as effective as the standard treatment at preventing infection?
User-specified values for this example:
pA = 0.038 is the rate of infection in patients receiving the standard of care treatment.
pB = 0.04 is the estimated rate of infection in patients receiving the new, less expensive treatment.
delta = 0.019 is the threshold/margin for equivalence for this example (1.9% above and below pA is considered equivalent to pA), where (3.8 - 1.9) = 1.9 and (3.8+1.9) = 5.7), so new treatment infection rates between 1.9% and 5.7% will be considered clinically equivalent to 3.8%. (Note, 1.9 is half of 3.8)
k = 1 is the sample size ratio (nB/nA). In this example we expect a balanced design so k = 1.
alpha = 0.05 is the probability of a Type I error.
beta = 0.20 defines 80% power where 1 – power = beta (i.e., 1 – 0.80 = 0.20)
Enter user specified values into R:
pA = 0.038
pB = 0.04
delta = 0.019
k = 1
alpha = 0.05
beta = 0.20
Formula for finding the required sample size (Source: Chow S, Shao J, Wang H. 2008. Sample Size Calculations in Clinical Research. 2nd Ed. Chapman & Hall/CRC Biostatistics Series):
(samplesize=(pA*(1-pA)/k+pB*(1-pB))*((qnorm(1-alpha)+qnorm(1-beta))/(pA-pB-delta))^2)
These two lines give exact power:
z=(pA-pB-delta)/sqrt(pA*(1-pA)/n/k+pB*(1-pB)/n)
(power=pnorm(z-qnorm(1-alpha))+pnorm(-z-qnorm(1-alpha)))
Results: With 1050 patients in the standard treatment group and 1050 patients in the new treatment group (total patients = 2100), we have an 80% chance of establishing equivalence in infection rates between the standard and new treatments. (Equivalence studies typically require large sample sizes.)
Example 2: The 10-year survival rate in patients getting the standard treatment A is 80%. The 10-year survival rate in patients getting the new treatment B is also 80%. How many patients are needed to show that the new treatment is as effective as the standard treatment?
User specified values for this example:
pA=0.80 is the 10-year survival rate for patients getting treatment A
pB=0.80 is the estimated 10-year survival rate for those getting new treatment B
delta=0.10 is the threshold/margin for equivalence. 10% on either side of 80% (70% to 90%) is equivalent to 80%
k=1 is the sample size ratio (nB/nA). In this example we expect a balanced design, so k = 1
alpha=0.05 is the probability of a Type I error
beta=0.20 defines 80% power where 1 – power = beta (i.e., 1 – 0.80 = 0.20)
Enter user-specified values into R:
pA = 0.80
pB = 0.80
delta = 0.10
k = 1
alpha = 0.05
beta = 0.20
Formula for finding required sample size (Source: Chow S, Shao J, Wang H. 2008. Sample Size Calculations in Clinical Research. 2nd Ed. Chapman & Hall/CRC Biostatistics Series):
(samplesize=(pA*(1-pA)/k+pB*(1-pB))*((qnorm(1-alpha)+qnorm(1-beta))/(pA-pB-delta))^2)
These two lines give exact power:
z=(pA-pB-delta)/sqrt(pA*(1-pA)/n/k+pB*(1-pB)/n)
(power=pnorm(z-qnorm(1-alpha))+pnorm(-z-qnorm(1-alpha)))
Results: With 197 subjects in both groups (total patients = 394), there is an 80% chance of establishing equivalence in both treatments’ 10-year survival rate.
(Evaluate whether a treatment is at least as effective as another treatment. Commonly used in healthcare to determine if a new therapy is as effective as a standard care therapy already in use.)
Example 1: There is a 3.8% rate of infection in patients receiving a standard of care treatment. Is a new, much less costly treatment just as effective as the expensive standard of care treatment. The rate of infection in patients receiving the new treatment is estimated to be around 4.0% (higher than 3.8% but may be worth the reduction in cost). How many patients are needed to show that the new treatment is at least as effective as the standard treatment at preventing infection?
User-specified values for this example:
pA = 0.038 is the rate of infection in patients receiving the standard of care treatment.
pB = 0.04 is the estimated rate of infection in patients receiving the new, less expensive treatment.
delta = 0.019 is the threshold/margin for equivalence for this example (1.9% above and below pA is considered equivalent to pA), where (3.8 - 1.9) = 1.9 and (3.8+1.9) = 5.7), so new treatment infection rates between 1.9% and 5.7% will be considered clinically equivalent to 3.8%. (Note, 1.9 is half of 3.8)
k = 1 is the sample size ratio (nB/nA). In this example we expect a balanced design so k = 1.
alpha = 0.05 is the probability of a Type I error.
beta = 0.20 defines 80% power where 1 – power = beta (i.e., 1 – 0.80 = 0.20)
Enter user specified values into R:
pA = 0.038
pB = 0.04
delta = 0.019
k = 1
alpha = 0.05
beta = 0.20
Formula for finding the required sample size (Source: Chow S, Shao J, Wang H. 2008. Sample Size Calculations in Clinical Research. 2nd Ed. Chapman & Hall/CRC Biostatistics Series):
(samplesize=(pA*(1-pA)/k+pB*(1-pB))*((qnorm(1-alpha)+qnorm(1-beta))/(pA-pB-delta))^2)
These two lines give exact power:
z=(pA-pB-delta)/sqrt(pA*(1-pA)/n/k+pB*(1-pB)/n)
(power=pnorm(z-qnorm(1-alpha))+pnorm(-z-qnorm(1-alpha)))
Results: With 1050 patients in the standard treatment group and 1050 patients in the new treatment group (total patients = 2100), we have an 80% chance of establishing equivalence in infection rates between the standard and new treatments. (Equivalence studies typically require large sample sizes.)
Example 2: The 10-year survival rate in patients getting the standard treatment A is 80%. The 10-year survival rate in patients getting the new treatment B is also 80%. How many patients are needed to show that the new treatment is as effective as the standard treatment?
User specified values for this example:
pA=0.80 is the 10-year survival rate for patients getting treatment A
pB=0.80 is the estimated 10-year survival rate for those getting new treatment B
delta=0.10 is the threshold/margin for equivalence. 10% on either side of 80% (70% to 90%) is equivalent to 80%
k=1 is the sample size ratio (nB/nA). In this example we expect a balanced design, so k = 1
alpha=0.05 is the probability of a Type I error
beta=0.20 defines 80% power where 1 – power = beta (i.e., 1 – 0.80 = 0.20)
Enter user-specified values into R:
pA = 0.80
pB = 0.80
delta = 0.10
k = 1
alpha = 0.05
beta = 0.20
Formula for finding required sample size (Source: Chow S, Shao J, Wang H. 2008. Sample Size Calculations in Clinical Research. 2nd Ed. Chapman & Hall/CRC Biostatistics Series):
(samplesize=(pA*(1-pA)/k+pB*(1-pB))*((qnorm(1-alpha)+qnorm(1-beta))/(pA-pB-delta))^2)
These two lines give exact power:
z=(pA-pB-delta)/sqrt(pA*(1-pA)/n/k+pB*(1-pB)/n)
(power=pnorm(z-qnorm(1-alpha))+pnorm(-z-qnorm(1-alpha)))
Results: With 197 subjects in both groups (total patients = 394), there is an 80% chance of establishing equivalence in both treatments’ 10-year survival rate.
52A. Survival Analysis (2 group comparison with estimated probabilities)
(Calculate the probability of finding a significant difference between 2 survival curves. Requires an estimate of the probability of an event in both group 1 and group 2 for the duration of the study. If estimated probabilities are not known, preliminary data is required (see #52B below). Also requires an estimate of Relative Risk (RR) which represents estimated ratio of expected incidence rates or proportions for an entire study.)
Example: Let’s say we plan on tracking survival in group 1 (n1 = 250) and group 2 (n2 = 250) over a six year period. The estimated probability of death in group 1 over 6 years is 0.35. The estimated probability of death in group 2 over 6 years is 0.52. The expected RR is 0.67 (0.35/0.52). Thus we expect that people in group 1 are 0.67 times less likely to die than persons in group 2, or we could say that there is a 33% (1-0.67) relative risk reduction of death for group 1.
n1 = 250
n2 = 250
p1 = 0.35
p2 = 0.52
k = n1/n2
m = (n1*p1) + (n2*p2)
RR = p1/p2
alpha = 0.05
(power = pnorm((((sqrt(k*m))*abs(RR-1))/(k*RR+1))-(qnorm(1-alpha/2))))
# Power formula taken from Rosner’s 6th edition of Fundamentals of Biostatistics, p. 807-809
Results: There is an 83% chance of correctly rejecting the null hypothesis of no difference between the 2 survival curves with 250 cases per group (total N = 500).
(Calculate the probability of finding a significant difference between 2 survival curves. Requires an estimate of the probability of an event in both group 1 and group 2 for the duration of the study. If estimated probabilities are not known, preliminary data is required (see #52B below). Also requires an estimate of Relative Risk (RR) which represents estimated ratio of expected incidence rates or proportions for an entire study.)
Example: Let’s say we plan on tracking survival in group 1 (n1 = 250) and group 2 (n2 = 250) over a six year period. The estimated probability of death in group 1 over 6 years is 0.35. The estimated probability of death in group 2 over 6 years is 0.52. The expected RR is 0.67 (0.35/0.52). Thus we expect that people in group 1 are 0.67 times less likely to die than persons in group 2, or we could say that there is a 33% (1-0.67) relative risk reduction of death for group 1.
n1 = 250
n2 = 250
p1 = 0.35
p2 = 0.52
k = n1/n2
m = (n1*p1) + (n2*p2)
RR = p1/p2
alpha = 0.05
(power = pnorm((((sqrt(k*m))*abs(RR-1))/(k*RR+1))-(qnorm(1-alpha/2))))
# Power formula taken from Rosner’s 6th edition of Fundamentals of Biostatistics, p. 807-809
Results: There is an 83% chance of correctly rejecting the null hypothesis of no difference between the 2 survival curves with 250 cases per group (total N = 500).
52B. Survival Analysis (2 group comparison with preliminary data)
(Calculate the probability of finding a significant difference between 2 survival curves. If the probability of an event in both groups for the duration of a study is unknown, we require preliminary data to compute these probabilities. Also requires an estimate of Relative Risk (RR) which represents estimated ratio of expected incidence rates or proportions for an entire study.)
Example: Let’s say we plan on tracking blindness in patients with retinitis pigmentosa. We have a treatment (nE) and control (nC) group. The preliminary data are available in the “Oph” R dataset. This dataset contains a time, status, and grouping variable for 354 cases. What is the probability of finding a significant difference between the 2 curves if there are 250 cases per group and the estimated RR is 0.67, representing a relative risk reduction in blindness for the experimental group?
nC = 250
nE = 250
RR = 0.67
alpha = 0.05
library(survival)
data(Oph)
(power = powerCT(formula=Surv(times,status)~group, dat=Oph, nE=250, nC=250, RR=0.67, alpha=0.05))
Results: There is an 82% chance of correctly rejecting the null hypothesis of no difference between the 2 survival curves with 250 cases per group (total N = 500).
(Calculate the probability of finding a significant difference between 2 survival curves. If the probability of an event in both groups for the duration of a study is unknown, we require preliminary data to compute these probabilities. Also requires an estimate of Relative Risk (RR) which represents estimated ratio of expected incidence rates or proportions for an entire study.)
Example: Let’s say we plan on tracking blindness in patients with retinitis pigmentosa. We have a treatment (nE) and control (nC) group. The preliminary data are available in the “Oph” R dataset. This dataset contains a time, status, and grouping variable for 354 cases. What is the probability of finding a significant difference between the 2 curves if there are 250 cases per group and the estimated RR is 0.67, representing a relative risk reduction in blindness for the experimental group?
nC = 250
nE = 250
RR = 0.67
alpha = 0.05
library(survival)
data(Oph)
(power = powerCT(formula=Surv(times,status)~group, dat=Oph, nE=250, nC=250, RR=0.67, alpha=0.05))
Results: There is an 82% chance of correctly rejecting the null hypothesis of no difference between the 2 survival curves with 250 cases per group (total N = 500).
53. Cross-Over Design
(Calculate power for a 2 group, 2 time period cross-over study).
Example: In a pharmaceutical study 2 groups switch between drug and placebo. There are 2 time periods. Each time period lasts 4 weeks. There is a 2-week washout between each time period. The outcome variable is a blood test. The blood test is given at the end of time 1 and again at the end of time 2.
Time 1 (4 week duration)
Group 1: treatment drug
Group 2: placebo
Washout period (2 week duration)
Time 2 (4 week duration)
Group 1: placebo
Group 2: treatment drug
Assume the following:
Delta = Mean difference between the 2 groups. Let's say the drug should increase mean blood levels by 2.55 points above the placebo.
VarDiff = expected variance of the differences between time 1 and time 2 scores. Let's say 25 for this example.
Z95 = Z score for 95% interval => 1.96
Zcrit = z-score for desired power level. If power is 80% a z-score of 0.84 is the point at which 80% lies below on the standard normal table. If power is 90% a z-score of 1.28 is the point at which 90% lies below on the standard normal table.
# Power formula taken from Rosner’s 6th edition of Fundamentals of Biostatistics, p. 706
n = (VarDiff * (Z95 + Zcrit)^2) / (2 * Delta^2)
n = (25 * (1.96 + 1.28)^2) / (2 * 2.55^2)
n = 20.18
Results We require 20 participants per group (total N = 40) to have a 90% chance of correctly rejecting the null hypothesis.
(Calculate power for a 2 group, 2 time period cross-over study).
Example: In a pharmaceutical study 2 groups switch between drug and placebo. There are 2 time periods. Each time period lasts 4 weeks. There is a 2-week washout between each time period. The outcome variable is a blood test. The blood test is given at the end of time 1 and again at the end of time 2.
Time 1 (4 week duration)
Group 1: treatment drug
Group 2: placebo
Washout period (2 week duration)
Time 2 (4 week duration)
Group 1: placebo
Group 2: treatment drug
Assume the following:
Delta = Mean difference between the 2 groups. Let's say the drug should increase mean blood levels by 2.55 points above the placebo.
VarDiff = expected variance of the differences between time 1 and time 2 scores. Let's say 25 for this example.
Z95 = Z score for 95% interval => 1.96
Zcrit = z-score for desired power level. If power is 80% a z-score of 0.84 is the point at which 80% lies below on the standard normal table. If power is 90% a z-score of 1.28 is the point at which 90% lies below on the standard normal table.
# Power formula taken from Rosner’s 6th edition of Fundamentals of Biostatistics, p. 706
n = (VarDiff * (Z95 + Zcrit)^2) / (2 * Delta^2)
n = (25 * (1.96 + 1.28)^2) / (2 * 2.55^2)
n = 20.18
Results We require 20 participants per group (total N = 40) to have a 90% chance of correctly rejecting the null hypothesis.
54A. Linear Mixed Effects Model: manual approach (1 fixed effect predictor; simulate data)
(Simulate dataset and find power for a fixed effect variable in a mixed effects model where there is 1 fixed effect variable and 1 random effect variable. This is the same thing as #54B below but using our own code to run the 100 simulations and estimate power.)
Example: A repeated measures study where 3 females and 3 males were measured 5 times each (3*5=15 + 3*5=15 is 30 measures total. Sex is the between subjects, fixed effect variable. Subject is the random effects variable. We want to know whether sex influences the outcome variable "Y". Find statistical power for sex. (Note, sex can be replaced with any fixed effects variable like treatment level (i.e., treated vs. control). Using the following code we create a simulated dataset where 3 males and 3 females are repeatedly measured 5 times each. (Subjects are the random effect variable and sex is the fixed effect variable.) Next we run the model, compute its power, and re-run this process 100 times.
library(lme4)
library(car)
N = 100
alpha = numeric(N)
for (i in 1:N) {
sex = c(rep("female",15), rep("male",15)) # 3*5=15 measures from females; 3*5=15 measures from 3 males (30 observations)
y = c(ifelse(sex=="female", (rnorm(n=15, mean=65, sd=10)), (rnorm(n=15, mean=50, sd=12))))
# Or values of "y" can be created using the following code:
# yfem = rnorm(15, 65, 10) # N, mean, SD for generating means
# ymale = rnorm(15, 50, 12) # N, mean, SD for generating means
# y = c(yfem,ymale)
subject = c(rep("1",5), rep("2",5), rep("3",5), rep("4",5), rep("5",5), rep("6",5))
mydata = data.frame(y, sex, subject)
lme.model = lmer(y ~ sex + (1|subject), data=mydata) # run the model
alpha[i] = Anova(lme.model)$Pr # get a p-value for the model
}
mean(alpha<0.05) # How often the p-value is less than 0.05 over 100 trials (power)
# or find power using these 3 steps:
alpha # prints p-values
alpha<0.05 # p-values <0.05 = TRUE
summary(alpha<0.05) # number of TRUEs divided by the total number of tests gives power
Results: There is approximately a 98% chance of correctly rejecting the null hypothesis of no significant difference between female and male outcomes on "y". (Note that power estimate may vary because data are simulated).
(Simulate dataset and find power for a fixed effect variable in a mixed effects model where there is 1 fixed effect variable and 1 random effect variable. This is the same thing as #54B below but using our own code to run the 100 simulations and estimate power.)
Example: A repeated measures study where 3 females and 3 males were measured 5 times each (3*5=15 + 3*5=15 is 30 measures total. Sex is the between subjects, fixed effect variable. Subject is the random effects variable. We want to know whether sex influences the outcome variable "Y". Find statistical power for sex. (Note, sex can be replaced with any fixed effects variable like treatment level (i.e., treated vs. control). Using the following code we create a simulated dataset where 3 males and 3 females are repeatedly measured 5 times each. (Subjects are the random effect variable and sex is the fixed effect variable.) Next we run the model, compute its power, and re-run this process 100 times.
library(lme4)
library(car)
N = 100
alpha = numeric(N)
for (i in 1:N) {
sex = c(rep("female",15), rep("male",15)) # 3*5=15 measures from females; 3*5=15 measures from 3 males (30 observations)
y = c(ifelse(sex=="female", (rnorm(n=15, mean=65, sd=10)), (rnorm(n=15, mean=50, sd=12))))
# Or values of "y" can be created using the following code:
# yfem = rnorm(15, 65, 10) # N, mean, SD for generating means
# ymale = rnorm(15, 50, 12) # N, mean, SD for generating means
# y = c(yfem,ymale)
subject = c(rep("1",5), rep("2",5), rep("3",5), rep("4",5), rep("5",5), rep("6",5))
mydata = data.frame(y, sex, subject)
lme.model = lmer(y ~ sex + (1|subject), data=mydata) # run the model
alpha[i] = Anova(lme.model)$Pr # get a p-value for the model
}
mean(alpha<0.05) # How often the p-value is less than 0.05 over 100 trials (power)
# or find power using these 3 steps:
alpha # prints p-values
alpha<0.05 # p-values <0.05 = TRUE
summary(alpha<0.05) # number of TRUEs divided by the total number of tests gives power
Results: There is approximately a 98% chance of correctly rejecting the null hypothesis of no significant difference between female and male outcomes on "y". (Note that power estimate may vary because data are simulated).
54B. Linear Mixed Effects Model: using SIMR package (1 fixed effect predictor; simulate data)
(Simulate dataset and find power for a fixed effect variable in a mixed effects model where there is 1 fixed effect variable and 1 random effect variable. This is the same thing as #54A above but using the SIMR package to run 100 simulations and estimate power.)
Example: A repeated measures study where 3 females and 3 males were measured 5 times each (3*5 + 3*5 = 30 measures total). Sex is the between subjects, fixed effect variable. Subject is the random effects variable. We want to know whether sex influences the outcome variable "Y". Find statistical power for sex. (Note, sex can be replaced with any fixed effects variable like treatment level (i.e., treated vs. control). Using the following code we create a simulated dataset where 3 females and 3 males are repeatedly measured 5 times each. (Subjects are the random effect variable and sex is the fixed effect variable.) Next we run the model, compute its power, and have SIMR re-run this process 100 times.
library(simr)
sex = c(rep("female",15), rep("male",15)) # 3*5=15 measures from females; 3*5=15 measures from 3 males (30 observations)
y = c(ifelse(sex=="female", (rnorm(n=15, mean=65, sd=10)), (rnorm(n=15, mean=50, sd=12))))
# Or values of "y" can be created using the following 3 lines of code:
# yfem = rnorm(15, 65, 10) # N, mean, SD for generating means
# ymale = rnorm(15, 50, 12) # N, mean, SD for generating means
# y = c(yfem,ymale)
subject = c(rep("1",5), rep("2",5), rep("3",5), rep("4",5), rep("5",5), rep("6",5)) # 1,2,3 are females; 4,5,6 are males
mydata = data.frame(y, sex, subject)
mymodel = lmer(y ~ sex + (1|subject), data=mydata) # run the model
set.seed(123)
powerSim(mymodel, nsim=100) # Find power for "sex" based on N=100 simulations
Result: There is approximately a 98% chance of correctly rejecting the null hypothesis of no significant difference between female and male outcomes on "y". (Note that power estimate may vary because data is simulated).
(Simulate dataset and find power for a fixed effect variable in a mixed effects model where there is 1 fixed effect variable and 1 random effect variable. This is the same thing as #54A above but using the SIMR package to run 100 simulations and estimate power.)
Example: A repeated measures study where 3 females and 3 males were measured 5 times each (3*5 + 3*5 = 30 measures total). Sex is the between subjects, fixed effect variable. Subject is the random effects variable. We want to know whether sex influences the outcome variable "Y". Find statistical power for sex. (Note, sex can be replaced with any fixed effects variable like treatment level (i.e., treated vs. control). Using the following code we create a simulated dataset where 3 females and 3 males are repeatedly measured 5 times each. (Subjects are the random effect variable and sex is the fixed effect variable.) Next we run the model, compute its power, and have SIMR re-run this process 100 times.
library(simr)
sex = c(rep("female",15), rep("male",15)) # 3*5=15 measures from females; 3*5=15 measures from 3 males (30 observations)
y = c(ifelse(sex=="female", (rnorm(n=15, mean=65, sd=10)), (rnorm(n=15, mean=50, sd=12))))
# Or values of "y" can be created using the following 3 lines of code:
# yfem = rnorm(15, 65, 10) # N, mean, SD for generating means
# ymale = rnorm(15, 50, 12) # N, mean, SD for generating means
# y = c(yfem,ymale)
subject = c(rep("1",5), rep("2",5), rep("3",5), rep("4",5), rep("5",5), rep("6",5)) # 1,2,3 are females; 4,5,6 are males
mydata = data.frame(y, sex, subject)
mymodel = lmer(y ~ sex + (1|subject), data=mydata) # run the model
set.seed(123)
powerSim(mymodel, nsim=100) # Find power for "sex" based on N=100 simulations
Result: There is approximately a 98% chance of correctly rejecting the null hypothesis of no significant difference between female and male outcomes on "y". (Note that power estimate may vary because data is simulated).
54C. Linear Mixed Effects Model: using SIMR package (2 or more fixed effect predictors; pilot data)
(Find power and effect size for 2 fixed effect variables in a linear mixed effects model, using pilot data. If you have to create a simulated dataset, follow instructions in 54B above.)
Example: Let's say that the "simdata" dataset is pilot data (see below). "y" is a continuous outcome, "x" is a continuous predictor, "z" is a count predictor variable, and "g" is a categorical predictor. Power a mixed effects analysis where "y" is the outcome, "x" and "z" are fixed effect predictors, and "g" is a random effect variable.
library(simr)
data(simdata) # example dataset in SIMR package
summary(simdata) # create a model predicting "y" from "x" and "z" when "g" is a random vbl
sapply(simdata, class) # check variables
mymodel = lmer(y ~ x + z + (1|g), data=simdata)
summary(mymodel)
doTest(mymodel, fixed("x")) # find p-value and effect size for fixed effect "x"
powerSim(mymodel, fixed("x"), nsim=50) # find power for fixed effect "x" using 50 simulations
doTest(mymodel, fixed("z")) # find p-value and effect size for fixed effect "z"
powerSim(mymodel, fixed("z"), nsim=50) # find power for fixed effect "z" using 50 simulations
Results: In a linear mixed effects model where "x" and "z" are fixed effect predictors and "g" is a random effect, there is a 100% chance of rejecting the null hypothesis of 'no effect' for variable "x", and a 60% chance of rejecting the null hypothesis of 'no effect' for variable "z".
(Find power and effect size for 2 fixed effect variables in a linear mixed effects model, using pilot data. If you have to create a simulated dataset, follow instructions in 54B above.)
Example: Let's say that the "simdata" dataset is pilot data (see below). "y" is a continuous outcome, "x" is a continuous predictor, "z" is a count predictor variable, and "g" is a categorical predictor. Power a mixed effects analysis where "y" is the outcome, "x" and "z" are fixed effect predictors, and "g" is a random effect variable.
library(simr)
data(simdata) # example dataset in SIMR package
summary(simdata) # create a model predicting "y" from "x" and "z" when "g" is a random vbl
sapply(simdata, class) # check variables
mymodel = lmer(y ~ x + z + (1|g), data=simdata)
summary(mymodel)
doTest(mymodel, fixed("x")) # find p-value and effect size for fixed effect "x"
powerSim(mymodel, fixed("x"), nsim=50) # find power for fixed effect "x" using 50 simulations
doTest(mymodel, fixed("z")) # find p-value and effect size for fixed effect "z"
powerSim(mymodel, fixed("z"), nsim=50) # find power for fixed effect "z" using 50 simulations
Results: In a linear mixed effects model where "x" and "z" are fixed effect predictors and "g" is a random effect, there is a 100% chance of rejecting the null hypothesis of 'no effect' for variable "x", and a 60% chance of rejecting the null hypothesis of 'no effect' for variable "z".