A Guide to Calculating Power

Picture
"Hey McFly! Those boards don’t work on water! Unless you’ve got power!"

This statement by Griff's buddies on Back to the Future II is applicable to the research domain. It goes something like this: “Hey McFly! That research won’t work on wishful thinking, unless you’ve got power!”

 Statistical power refers to the probability of correctly rejecting the null hypothesis of no effect. It is essential that researchers know their statistical power before launching into a research project. Low statistical power may lead one to conclude that there is no effect from a treatment when there is (called a Type II error), while an “overpowered” study may lead one to conclude that there is an effect when in fact there is not (called a Type I error). 

Concern over statistical power is a relatively recent phenomenon. Studies have shown that several past studies in the social and health sciences have been underpowered. Many of these had only a 20-30% chance of correctly rejecting the null hypotheses, possibly leading researchers to incorrectly conclude that treatment effects were not real. An awareness of this problem has led most institutional review boards (IRBs) and granting agencies to require power and sample size calculations before approving studies. But many people do not know where to go to calculate power.

There are a number of commercial power and sample size programs available. PASS, SPSS Power, and NQuery are a few examples. However, these programs cost a lot of money (each in the $1000 range). There are also several freeware power and sample size calculators available, but most of these are limited in the number of available power calculations. 

I have used several commercial and free power and sample size programs. One of my favorites is GPower.  GPower was created by faculty at the Institute for Experimental Psychology in Dusseldorf, Germany. It offers a wide variety of calculations along with graphics and protocol statement outputs. Best of all, it is free! The developers released version 3.1 in June 2009. Terms of use and a downloadable zip file are available here.

After downloading the program you may ask yourself, “How do I use it?” There are limited resources. The developers have a tutorial on using GPower, but it is sparse and does not cover more advanced techniques. (Nevertheless, the tutorial is a good place to start to familiarize oneself with the program’s capabilities.) There are also a few books that mention GPower like Determining Sample Size by Patrick Dalto. Unfortunately this book only covers analyses with an older version of GPower, 2.0.

 I created an easy-to-follow guide for using GPower 3.x. The guide is included below. It is a work in progress and I will update it and add more analyses in the near future. Most examples have been checked against power calculations in SPSS 17.0. I cannot guarantee the completeness and correctness of the material. Users assume all risks associated with using this guide. If you have any comments or suggestions on improving the guide, please let me know.

                                                    A Guide to Using GPower
 
Exact Tests. The main characteristic of these exact methods is that statistical tests and confidence intervals are based on exact probability statements that are valid for any sample size.  Most often used when sample size is small and homoschedasticity does not pertain. 

1.  Correlation: Bivariate Normal Model (one sample case)
Test whether a single r is statistically different from zero (2 continuous variables).
Example: Is a correlation of 0.75 significantly different from zero?
Tails = 1 or 2
Correlation p H1 (corr. value assuming H1) = 0.75 (the r value is the effect size)
Alpha = .05 (or .01)
Power = desired level
Correlation p Ho (corr. value assuming Ho) = usually 0

2.  Linear Multiple Regression: Random Model
To test whether a group of predictors significantly predicts an outcome variable.
Example: Do IV1, IV2, IV3, and IV4 significantly predict a DV?
Tails = 1 or 2
H1 p2 = click “Determine” to estimate the population multiple correlation coefficient. Choose “from predictor correlations. Enter number of predictors. Click on “specify matrices” and enter IV’s correlations with the DV. Calculate p2 and then accept values.
H0 p2 = null hypothesis multiple correlation coefficient (usually 0)
Power = enter desired power level
Number of predictors = enter number of IVs, in this case 4 IVs.

3.  Proportion: Difference from Constant (binomial test, one sample case)
To test whether a sample proportion differs from a population proportion.  Use the exact test when n*p0*q0 < 5 or n*p0 and n*(1-p0) < 5 (and so the normal approximation cannot be used)
Example: The prevalence of breast cancer among middle aged women in the general population is .02.  The breast cancer rate among a sample of women who have a sister with breast cancer is .05.  What sample size is needed to detect a significant difference between the population and sample proportions? (To claim that the rate of cancer among women with sister history is 2.5 times [.05/.02] higher than those without sister history?)
Tails = 2 or 1
Effect size g = P1 is the constant proportion or H0 prop(.02); P2 is alt prop(.05). g = 0.03
Alpha = .05
Power = .90
Constant proportion = H0 prop (.02) which is the same as P1

4.  Proportions: Inequality, 2 Dependent Groups (McNemar’s)
Compare 2 proportions when people in both groups have been paired/matched.
Example: Test whether the proportion of people who quit smoking in a hypnotism smoking cessation program (Ph=.76) is different (2 tail) or greater than (1-tail) the proportion of people who quit smoking by chewing Big Red gum (Pg=.54) where both groups are matched with people of the same age, sex, and smoking history.
Tails = 1 or 2
OR = [Ph/(1-Ph)]/[Pg/(1-Pg)] = [.76/(1-.76)]/[.54/(1-.54)] = 3.17/1.17 = 2.70
Alpha = .05
Power = desired level
Prop. Discordant Pairs = how many pairs are not matched?

5.  Proportions: Inequality, 2 Independent Groups (Fisher’s Exact)
Compare 2 independent proportions.
Example: Based on previous data, the expected proportion of students passing a stats course taught by psychology teachers is 0.85.  The expected proportion of students passing the same stats class taught by mathematics teachers is 0.95.  How many participants are needed to detect a significant difference between the 2 proportions in a prospective study?
Tail = 1 or 2
Prop 1 = 0.85 (do not use “determine” unless you have an OR to enter instead of props)
Prop 2 = 0.95
Alpha = .05
Power = choose your level.

6.  Proportions: Inequality, 2 independent groups (unconditional)
Not sure about this one.  It uses one proportion and an OR.

7.  Proportions: Inequality, (offset) 2 independent groups (unconditional)
Not sure about this one.  It uses conditional probabilities.

8.  Proportion: sign test (binomial test)
Not sure about this one.

9.  Generic Binomial test
Compare proportions of responses for a binary variable (one variable, 2 levels).
Example: Are there more females (0) than males (1) listed in a binary variable called gender?
Proportion p2 = if this represented females, you would enter proportion of females in the “gender” variable, say 0.60
Alpha = select your level (.01 or .05)
Total Sample Size = total number of females and males listed in the variable “gender”.
Proportiion p1 = if this represents males, you would enter proportion of males in the “gender” variable, say 0.40

T-Tests
Cohen’s Effect Size Conventions for “d”
d = 0.20 (small)
d = 0.50 (medium)
d = 0.80 (large)

1.  Correlation: Point Biserial Model
Tests whether a correlation coefficient is significantly different from zero, when one variable is continuous and the other is dichotomous.
Example: How many participants are needed to determine whether an expected r = 0.30 is significantly different from zero when correlating test scores (continuous) and gender (dichotomous)?
Tails = 1 or 2
Effect Size |r| = enter the correlation coefficient 0.30 (no need to click on “determine”)
Alpha = .05                              
Power = choose your power

2.  Linear Bivariate Regression: one group, size of slope
Determine whether the slope for a predictor variable is significantly different from 0.

3.  Linear Bivariate Regression: 2 groups, difference between intercepts

4.  Linear Bivariate Regression: 2 groups, difference between slopes

5.  Linear Multiple Regression: Fixed model, single regression coefficient

6.  Means: Difference between 2 dependent groups (matched pairs)
Within, dependent, correlated, paired samples t-test.
Example: What sample size is needed for comparing before and after scores on depression to test whether an antidepressant works.  Before treatment mean score = 45 (SDbefore = 2.1); After treatment mean score = 32 (SDafter = 1.6).
Tails = 1 or 2
Effect Size dz = Click “determine” and enter “before” data for group 1, and “after” data for group 2, and enter correlation between the 2 sets of data.
Alpha = .05
Power = select desired level

7.  Means: Difference between 2 independent groups
Between, independent groups t-test.
Example: What sample size is needed to compare control and treatment groups?
Tails = 1 or 2
Effect size d = use top pane if sample sizes are not equal (using SDpooled).  Use bottom pane if sample sizes are equal (balanced).
Alpha = .05
Power = select desired level
Allocation ratio = ratio of sample sizes (enter 1 if sample sizes are expected to be equal).

8.  Means: Difference from constant (one sample case)

One sample t-test.
Example: Compare a sample mean against a null hypothesis population mean.
Tails = 1 or 2
Effect size d = enter H1 and H0 means.  Enter estimated sigma σ using sample SD.
Alpha = .05
Power = select desired power level

9.  Wilcoxon Signed-Ranks Tests (matched pairs)
Non-parametric test for comparing 2 matched groups.        

10.  Wilcoxon Signed-Ranks Test (one/within sample case)
Non-parametric test for comparing within group data.

11.  Wilcoxon Rank-Sum or MWU (2 independent groups)
Non-parametric test for comparing 2 independent groups.

12.  Generic t-test
No a priori calculations

Chi-Square Tests – (no chi-square test for independence in here)
Cohen’s Effect Size Conventions for “w”
w = 0.10 (small)
w = 0.30 (medium)
w = 0.50 (large)

 1.  Goodness of Fit: Contingency Tables
Chi-Square test for Goodness of Fit
Example: Observed number of people belonging to groups A, B, C, and D are compared against expected values.
Tails = 1 or 2
Effect Size w = select “determine”. Number of cells refers to # of categories, in this case there are 4 (A, B, C, D). P(H0) is the column for expected observed values. P(H1) is the column for observed values. The proportions in each column must add up to 1. The 2 cells above equal P(H0) and P(H1) are for entering an equal proportion for the respective cells in one column and then click the ‘equal’ button.  Don’t know about the normalize buttons. “Auto calc. last cell” button computes final proportion for the last cell in a column so that the total is 1.0.
Alpha = .05
Power = select desired level
DF = (# categories – 1).  In this example 4-1 = 3.

2.  Variance: difference from constant (one case)
Not sure about this one

3.  Generic X2 Test
No a priori calculations

Z Tests
Size conventions for correlation:
r = small 0.10
r = medium 0.30
r = large 0.50

1.  Correlation: Tetrachoric Model
Correlate 2 artificially dichotomized variables

2.  Correlation: 2 Dependent Pearson r’s (common index)
Correlate Pearson correlation coefficients from 2 dependent samples.

3.  Correlation: 2 Dependent Pearson r’s (no common index)
(not sure how this differs from #2)

4.  Correlation: 2 independent Pearson r’s
Compare 2 Pearson correlation coefficients from 2 independent samples.
Example:  Test whether the correlation between hours studied and test score for group A is statistically different than the correlation between hours studied and test score for group B.
Tails = 1 or 2
Effect size q = click ‘determine’ and then enter both r’s
Alpha = .05
Power = select desired level
Allocation ratio n2/n1 = enter ratio of participants in group A to group B.

5.  Logistic Regression

6.  Poisson Regression

7.  Proprotions: Difference between 2 independent proportions
Compare 2 proportions from 2 independent groups
Example: The proportion of divorced Mormons in a sample of 100 is 0.22, and the proportion of divorced Catholics in another sample of 100 is 0.31. Is there a significant difference?
Tails = 1 or 2
Proportion 2 = enter Catholic proportion 0.31
Proportion 1 = enter Mormon proportion 0.22
Alpha = .05
Power = select desired level
Allocation ratio n2/n1 = enter ratio of participants in both groups (i.e., 100/100 = 1.0)

8.  Generic Z Test

Univariate F-Tests
Cohen’s effect size conventions for “f”
f = 0.10 (small)
f = 0.25 (medium)
f = 0.40 (large)

1.  ANCOVA: Fixed effects, main effects, and interactions

2.  ANOVA: Fixed Effects, omnibus, one-way
One-Way between (fixed effects) groups ANOVA
Example: We want to compare mean scores on an algebra test for students who took the test listening to Rock, Country, and Rap.
Determine Effect Size = Select Procedure > effect size from mean. Enter number of levels (groups) of the fixed variable being compared (in this case 3 music groups). Enter expected SD for all groups (assuming homogeneity of variance). Enter expected mean test scores in the table along with expected sample sizes for each. If sample sizes are equal, then enter the amount in “equal n” and then click (in this case we expect 12 participants per group). Click “calculate effect size” and transfer to main window.
Alpha = .05
Power = select desired power level (use “post hoc” to enter sample size)
Number of Groups = already inserted from effect size calculations, so 3.

3.  ANOVA: Fixed effects, special, main effects and interactions
Two-way (or higher) between (fixed effects) groups ANOVA (single analysis good for both main effects and interactions)
Example: We want to see if there is a difference in test scores based on gender (female vs. male) and race (Caucasian, Hispanic, Black, Native) thus making a 2x4 analysis.
Determine Effect Size = Select Procedure > direct method. Enter partial eta squared (n2) which is the effect size measure indicating the total variance explained by the IVs, main effects, and interactions. Click “calculate effect size” and transfer to main window.
Alpha = .05
Power = desired level (select “post hoc” to enter sample size)
Numerator df = this specifies which main effect or interaction you are testing for.  It is found by taking the number of levels and subtracting one. In this case, enter 4-1 = 3 df if testing for race, 2-1=1 df if testing for gender, and (2-1)*(4-1) = 3 df if testing for the interaction.
Number of groups = found by multiplying the levels in both factors (in this case 2x4=8) 

4.  ANOVA: Repeated measures, between factors
RMANOVA (just for comparing levels of a between factor like gender)
Example: The same students took 3 tests under 3 different music conditions (rock, country, and rap). We want to know if there is a significant effect for gender (males vs. females), the between factors variable.
Determine Effect Size = Select Procedure > effect size from mean. Enter number of levels of the fixed variable being compared (in this case 2 genders). Enter expected SD for all groups (assuming homogeneity of variance). Enter expected mean test scores in the table along with expected sample sizes for each. If sample sizes are equal, then enter the amount in “equal n” and then click “calculate effect size” and transfer to main window.
Alpha = .05
Power = desired level (select “post hoc” to enter sample size)
Number of Groups = 2
Repetitions = 3 music conditions
Correlation among repeated measures = enter approximate correlation

5.  ANOVA: Repeated measures, within factors
RMANOVA (just for comparing levels of a within factor variable like days)
Example: The same students took 3 tests under 3 different music conditions (rock, country, and rap). We want to know if there is a significant effect for music condition, the within factor variable. Two between factors groups for gender (male vs. female).
Determine Effect Size = Select Procedure > direct method. Enter partial eta squared (n2) which is the effect size measure indicating the total variance explained by the IVs, main effects, and interactions. Click “calculate effect size” and transfer to main window. Eta squared size conventions: small = .01; medium = .06; large = 0.14.
Alpha = .05 (for one tail)
Power = desired level (select “post hoc” to enter sample size)
Number of groups = of the between subjects factor, in this case 2 for gender
Repetitions = number of repeated measures, in this case 3 for music condition
Correlation among repeated measures = whatever you think this might be
Nonsphericity correction e = 1.0 if sphericity assumption is met, something else if not met. (Highest value is 1.0, and lowest value = 1/[repetitions – 1].)

(Sphericity assumption in univariate RMANOVA – When the repeated measures are transformed by a set of orthogonal weights, they should be uncorrelated with each other but have equal variances. This is the sphericity assumption. If the design includes a between-subjects factor, then sphericity must be met. How well this assumption is met is determined by the Epsilon statistic which ranges from 0 to 1, with 1 being perfect sphericity and 0 being complete violation. About .75 or higher is usually acceptable in most RMANOVA designs. Failure to meet the sphericity assumption increases the Type I error rate. When sphericity assumption is met, the univariate test is more powerful than the multivariate test. If the assumption is not met, you may use epsilon multipliers (see SPSS printout) although these may be too conservative, or you may use the multivariate methods below which do not require the sphericity assumption.)

6.  ANOVA: Repeated measures, within-between interaction
RMANOVA (just for testing the interaction of within and between variables)
Example: The same students took 3 tests under 3 different music conditions (rock, country, and rap). The within factors is music and the between factors is gender. We want to know if there is a significant effect for the interaction between music and gender.
Determine Effect Size = Select Procedure > direct method. Enter partial eta squared (n2) which is the effect size measure indicating the total variance explained by the IVs, main effects, and interactions. Click “calculate effect size” and transfer to main window. Eta squared size conventions: small = .01; medium = .06; large = 0.14.
Alpha = .05 (for one tail)
Power = desired level (select ‘post hoc’ to enter sample size)
Number of groups = of the between subjects factor, in this case 2 for gender
Repetitions = number of repeated measures, in this case 3 for music condition
Correlation among repeated measures = whatever you think this might be

Multivariate F Tests
7.  Hotelling’s T2: One group mean vector
Multivariate analysis for comparing within group data on 2 or more DVs.
Example: We want to compare patients’ pre-treatment measures with post-treatment measures based on 2 outcome variables Y1 and Y2 (you may have more than 2 DVs).
Determine Effect Size = (Note: do not use SPSS’s partial eta2.) Click and select number of outcome variables, in this case 2 (Y1 and Y2).  Three input techniques:
1. Variance - Covariance matrix: Enter vector/column means for the differences between pre and post outcome data vectors, and fill in the variance-covariance matrix for the differences vectors (var. in diagonal and cov. in off-diagonals).
E.G.,    pre1     pre2     post1    post2    Diff1     Diff2     
            5          8            4           8            1           0
            4          9            4           7            0           2
            5          7            6           6           -1           1
            6          8            5           7            1           1
                                    Vector means:  0.25      1.0
                                                Var:        0.92     0.67
                                                Cov:            -0.33
                                                Corr:           -0.42

2 & 3. SD and correlation: Enter vector means for the differences between pre and post outcome data vectors, and fill in the SD-correlation matrix for the differences vectors (SDs in diagonal and corr. in off-diagonals). (Don’t know about “autocorr” and “multiply all means by” windows right now, although the latter should probably be set to 1).
Alpha = .05 (for one tail)
Power = set desired level (use post hoc to enter sample size)
Allocation ratio N2/N1 = sample size for group 2 divide by sample size for group 1
Response variables: enter number of outcome variables, in this case 2.

8.  Hotelling’s T2: Two group mean vectors
Multivariate analysis for comparing 2 independent groups on 2 or more DVs.)
Example: We want to compare patients who get therapy A with patients getting therapy B based on 2 outcome variables Y1 and Y2 (you may have more than 2 DVs).
Determine Effect Size = (Note: do not use SPSS’s partial eta2.) Click and select number of outcome variables, in this case 2 (Y1 and Y2).  Three input techniques:
1. Variance – Covariance matrix: Enter vector/column means & fill in variance-covariance matrix for Y1 and Y2 data vectors (var. in diagonal and cov. in off-diagonals).
2 & 3. SD and corr. matrix: Enter vector/column means and fill in the SDs & correlation matrix (SDs in diagonal, corr. in off-diagonals). (Don’t know about “autocorr” and “multiply all means by” windows right now, although the latter should probably be set to 1).
Alpha = .05 (for one tail)
Power = set desired level (use post hoc to enter sample size)
Allocation ratio N2/N1 = sample size for group 2 divide by sample size for group 1
Response variables: enter number of outcome variables, in this case 2

9.  MANOVA – Global Effects
Multivariate analysis for comparing 2 or more independent groups (fixed factors) when we have 2 or more DVs.
Example: We want to compare 4 groups of patients getting therapies A, B, C, and D based on 5 outcome variables Y1, Y2, Y3, Y4, and Y5.
Options = Select Muller & Peterson (1984) method (used by SPSS)
Determine Effect Size = Enter Pillai’s trace based on analysis with preliminary data set. Enter number of groups, in this case 4. Enter number of response (DV) variables, in this case 5. Enter total sample size, in this case 4 groups x 5 patients/group = 20. Calculate effect size and return to main window. Calculations for f2 = [Pillai’s Trace V / (s – V)], where “s” equals the smaller of either number of DVs or number of groups minus 1. f2 Effect Size Conventions: Small = (.10)2 = .01; Medium = (.25)2 = .06; Large = (.40)2 = 0.16.
Alpha = .05
Power = set desired level (choose post hoc to enter sample size)
Number of Groups = in this case 4
Response Variables = number of DVs, in this case 5

10.  MANOVA – Special Effects and Interactions
Multivariate analysis for comparing the interaction of within and fixed factors (factorial design) when we have 2 or more DVs.
Example: We want to compare 4 groups of patients getting therapies A, B, C, and D based on 5 outcome variables Y1, Y2, Y3, Y4, and Y5, with sex (M vs. F) as a between subjects factor.
Options = Select Muller & Peterson (1984) method (used by SPSS)
Determine Effect Size = Enter Pillai’s trace based on analysis with preliminary data set. Enter number of groups, in this case 4. Enter number of response (DV) variables, in this case 5. Enter total sample size, in this case 4 groups x 5 patients/group = 20. Calculate effect size and return to main window. Calculations for f2 = [Pillai’s Trace V / (s – V)], where “s” equals the smaller of either number of DVs or number of groups minus 1. f2 Effect Size Conventions: Small = (.10)2 = .01; Medium = (.25)2 = .06; Large = (.40)2 = 0.16.
Alpha = .05
Power = set desired level (choose post hoc to enter sample size)
Number of Groups = in this case 4
Response Variables = number of DVs, in this case 5 

11.  MANOVA: Repeated Measures, Between Factors
Testing between factor effects in univariate RMANOVA using the multivariate approach, when sphericity assumption is not met

12.  MANOVA: Repeated Measures, Within Factors
Testing within factor effects in univariate RMANOVA using the multivariate approach, when sphericity assumption is not met

13.  MANOVA: Repeated Measures, Within-Between Interaction
Testing interaction of within and between factors in univariate RMANOVA using the multivariate approach, when sphericity assumption is not met

14.  Linear Multiple Regression: Fixed model, R2 deviation from zero

15.  Linear Multiple Regression: Fixed model, R2 increase

16.  Variance: Test of equality (2 sample case)

17.  Generic F Test



Google Analytics