R is a versatile and reliable statistics package. What makes it particularly attractive to students, statisticians, and researchers is it's free. That’s right – you can get a leading statistical package for zero dinero. Still not smiling? Most fully licensed statistical packages cost $2000 or more; R costs $0.
Now that you’re smiling, let’s look at how to set up R.
R SET UP
Step 1. Goto the R website at http://cran.r-project.org/
Step 2. Select your operating system (Linux, Mac, Windows)
(Note: the rest of the instructors are for Windows users)
Step 3. Click on “base”.
Step 4. Select the latest version of R for download (top of page)
Step 5. Save the R .exe file to a folder on your computer.
Step 6. After the file loads, open the folder and double click the .exe file to start the installation. It is probably best to select the default download settings for now. The installation will create an R folder in the list of program files. If you download newer versions of R, this is where the program files will be stored.
After completing steps 1-6, look for the R shortcut icon on your desktop. Double click on it to launch R. You will see a window with the RGui (R graphic user interface). You are set to go and can enter all sorts of data and programming commands for running statistical functions (several of these are shown below). If you are not all that comfortable working in programming language, you might be interested in a couple of options that make R easier and more intuitive to use. I highly recommend R Deducer and RJGR (R jaguar).
First, let’s take a look at Deducer. Here is a description of Deducer from its designers.
“Deducer is designed to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an excel-like spreadsheet in which to view and edit data frames. The goal of the project is two fold: (1) Provide an intuitive graphical user interface (GUI) for R, encouraging non-technical users to learn and perform analyses without programming getting in their way; (2) Increase the efficiency of expert R users when performing common tasks by replacing hundreds of keystrokes with a few mouse clicks. Also, as much as possible the GUI should not get in their way if they just want to do some programming.”
DEDUCER SET UP
If you want Deducer, follow these steps.
Step 1. In R, click on the “Packages” drop down menu and select “Set Cran Mirror”. Choose a location in your country to download the files from.
Step 2. Select “install package(s)” in the Packages drop down list.
Step 3. A window showing all the available packages will open up. Scroll down until you come to the deducer packages. Select the packages by holding down the control key while clicking on deducer, deducerextras, deducermmr, deducerpluginexample, and deducerpluginscaling. Click “ok” and these packages will start downloading.
Step 4. After the program has finished installing the Deducer packages, click on “load packages” in the Packages drop down list. You will see a list of packages currently available. Select the Deducer packages and click “ok” to load them onto your computer.
If you don’t want to have to load Deducer packages every time you launch R and if you want an easy-to-use interface alternative to the RGui, then I recommend RJGR. RJGR (R Jaguar) is a user-friendly user interface that incorporates a spreadsheet-like data editor, syntax highlighting to ensure completion of syntax, and a cool syntax autocompletion function to help with syntax writing. These are all cool features. Best of all, RJGR can be incorporated with Deducer.
RJGR (Jaguar) SET UP
If you want RJGR, follow these steps to set it up. The package will automatically integrate with Deducer.
Step 1. In R, click on the “Packages” drop down menu and select “Set Cran Mirror”. Choose a location in your country to download the file from.
Step 2. Select “install package(s)” in the Packages drop down list.
Step 3. A window showing all the available packages will open. Scroll down until you come to the JGR package. Select JGR and click “ok”. The package will install.
Step 4. After the JGR installs click on “load packages” in the Packages drop down list. You will see a list of packages currently available. Select the JGR package and click “ok”.
Step 5. Goto http://www.rforge.net/JGR/. Scroll down to the Download section and select the appropriate execution file (in blue). Save the file to the desktop.
Step 6. When the file shows up on the desktop, double click on it. When a window opens select the box next to “Do not show this notice in the future”. R Jaguar will load with Deducer. You may now click on the JGR desktop icon everytime you want to launch R.
With R, Deducer, and JGR you are ready to enter data, load a data set, and explore the many statistical options available in the drop down menus. Note that you can also use the traditional programming language mode with nice features like autocomplete.
Programming syntax for statistical operations
Matrix Math Functions
a=c(1,4,7)
b=c(2,5,8)
c=c(3,6,9)
matrixa=cbind(a,b,c) # these 4 lines create a 3x3 matrix
matrixa=matrix(c(1,2,3,4,5,6,7,8,9),byrow=TRUE,ncol=3) # creates the same 3x3 matrix as above
t(matrixa) # transposes matrix a
matrixab=matrixa%*%matrixb # multiplies matrixes a and b
solve(matrixa) # takes the inverse of a matrix
Data Entry Techniques
x = c( ) # enter variable x values in parentheses separate by commas.
dataset = data.frame (x,y,z) # combines vbls x, y, and z into one data set.
cbind(x, y, z) # combines x, y, and z into one dataset.
dataset = edit (data.frame( )) # opens spreadsheet for data entry.
attach(dataset) # program will recognize dataset.
x=rnorm(100) # randomly selects 100 values from normal distribution
x=rnorm(10,mean=100,sd=16) # randomly selects 10 values from a dist. with specified parameters
x=1:50 # creates a sequence of numbers from 1 to 50
attach(dataset) # Tells the program to recognize the dataset for analyses
Importing Data Files (remember to "attach" data files after importing)
*Best Approach* = Use Deducer. It will automatically import Excel, SPSS, and SAS files.
From Excel
dataset = read.table("clipboard") # copy data (include just numbers, do not include variable names). Go to R and run the command. Quick and dirty way to get data from Excel into R.
From Excel
dataset=read.table("C:/Documents and Settings/username/Desktop/dataset.csv", header=TRUE, sep=",")
1. In Excel, save dataset to desktop as a comma delimited (csv) file (found in the ‘save as type’ drop down menu)
2. Right click on desktop file and select ‘properties’ to get file location information. Copy and paste file location information into R syntax after (“
3. Change slashes to forward facing and add file name at the end followed by ”,
4. Include header=TRUE to retain variables’ names, and include sep="," to recognize comma delimited format.
From SPSS
dataset=read.table("C:/Documents and Settings/username/Desktop/dataset.csv", header=T", sep=",")
1. Save SPSS or PASW data file in the comma delimited (csv) format.
2. Right click on desktop file and select properties to get file location information. Copy and paste file location information into R syntax after (“
3. Change slashes to forward facing and add file name at the end followed by ”,
4. Include header=TRUE to retain variables’ names, and include sep="," to recognize comma delimited format.
Viewing and Editing Data
x = edit(x) # opens window to edit variable “x”
data.entry (x) # opens spreadsheet to edit variable “x”
dataset=edit(dataset) # opens spreadsheet to edit dataset. Run attach(dataset) after edits
x[1]=3 # changes first value in variable ‘x’ to the number 3.
data ( ) # lists available data sets
x # just type variable name to see a list of values in the variable “x”
x [] # put number in the parentheses to bring up datum for that position
ls ( ) # lists active variables
rm ( ) # removes/deletes variable and its data
rank ( ) # gives ranks for the data points
sort ( ) # ranks data from smallest to highest for specified variable
round (x, n) # round the elements of “x” to “n” decimal places
Basic Stats & Descriptives
mean (x)
median (x)
mode (x)
max (x)
min (x)
quantile (x)
IQR (x)
range (x) # gives lowest and highest score
sum (x) # gives sum of variable
sd (x) # unbiased standard deviation
var(x) # unbiased variance
summary (x) # summary stats
length (x) # sample size
cov (x, y) # covariance for x and y
scale(x) # to find z-scores
pnorm(scale(x)) # gives (Percentile Rank) areas to left of z-scores
pnorm(x, mean, sd) # gives PR for data points with a specified mean and sd
t = (scale(x))*10+50 # covert into t-scores
Graphics
plot (x) # creates scatter plot
plot(x,y) # creates scatter plot (outcome variable [y] listed second)
abline(lm(y~x)) # draws regression line (you must do plot(x,y) first and minimize its window)
plot(scale(x),scale(y)) # puts both variables on the same scale
barplot(x) # creates bar graph
boxplot(x) # creates boxplot
boxplot (x,y) # view both plots side by side
boxplot (y~x) # y is continuous and x is grouping factor
bxyplot(y~x|z) # y and x are continuous, z is grouping factor (requires lattice package)
stem (x)
hist (x)
lines(density(variable.name)) # superimpose line on a histogram [do hist (x) first]
hist (x,10) # a histogram with 10 breaks
table (x)
Regressions
Linear Regression
lm(y~x) # assumes that a data set with a "y" DV and "x" is already active
summary(lm(y~x)) # gives data for regression abline
abline(lm(y~x)) # draws regression line (you must do plot(x,y) first and minimize its window)
or try...
model=lm(y~x, data=datafile.name)
summary(model)
and then...
coefficients(model) #model coefficients
confint(model, level=0.95) #95% CIs for coefficients
fitted(model) # predicted values
residuals(model) # residuals
anova(model) # anova table
vcov(model) # covariance matrix for model parameters
influence(model) # regression diagnostics
Multiple Regression
summary(lm(y ~ x1 + x2 + x3)) #assumes data set is already active,
or try...
model=lm(y~x1+x2+x3, datafile.name)
summary(model)
and then...
coefficients(model) # model coefficients
confint(model, level=0.95) #95% CIs for coefficients
fitted(model) # predicted values
residuals(model) # residuals
anova(model) # anova table
vcov(model) # covariance matrix for model parameters
influence(model) # regression diagnostics
Logistic Regression (where Y is a binary factor and predictors are continuous variables)
summary(glm(y ~ x1 + x2 + x3, family=binomial)) # assumes data set is already active
or try...
model <- glm(y~x1+x2+x3,data=datasetname, family=binomial)
summary(model) # display results
and then...
confint(model) # 95% CI for the coefficients
exp(coef(model)) # exponentiated coefficients to get Odds Ratios
exp(confint(model)) # 95% CI for exponentiated coefficients (Odds Ratios)
Correlation
cor (x,y) # gives Pearson r
cor(x,y)^2 # coefficient of determination
cor(rank(x),rank(y)) # gives Spearman ranks correlation coefficient
Testing Proportions
Single Sample Proportion test
prop.test(n, N, p = null prop, conf.level=.95)
Two Samples Proportion Test
prop.test(c(n1, n2), c(N1, N2))
Parametric Tests
Z-test
z = (mean(x) – mu)/sigma/sqrt(length(x)) # one Sample z-test
Between groups t-test equal variances
t.test(x,y, var.equal=TRUE) # Add alt="two.sided", "less", or "greater" to specify alternative hypothesis
Between Groups, t-test, unequal variances (Welch’s Test)
t.test(x,y, var.equal=FALSE) # Add alt="two.sided", "less", or "greater" to specify alternative hypothesis
Within Groups t-test
t.test(x, y, paired = TRUE) # Add alt="two.sided", "less", or "greater" to specify alternative hypothesis
Independent Groups, One-Way ANOVA
dataset = stack(dataset)
df = stack(data.frame(x,y,z))
anova(lm(values ~ ind,data = df))
Factorial, Two-Way ANOVA (coming soon)
Non-Parametric Tests # Add alt="two.sided", "less", or "greater" to specify alternative hypothesis
wilcox.test(x, mu = 5) # Wilcoxon for comparing x data to a population value mu.
wilcox.test(x, y, paired = TRUE) # Wilcoxon-Signed-Ranks
wilcox.test(x,y) #Wilcoxon-Rank-Sum a.k.a Mann-Whitney-U
obs = c(n1, n2, n3, n4) # Chi-square test for goodness of fit (all 3 rows)
exp = c(prop1, prop2, prop3, prop4)
chisq.test (obs, p = exp)
chisq.test(data.frame(vbl.1, vbl.2)) # Chi-square test for independence
dataframe.name = stack(dataframe.name, df = stack(data.frame(x,y,z))
kruskal.test(values ~ ind, data = df)
Matrix Math Functions
a=c(1,4,7)
b=c(2,5,8)
c=c(3,6,9)
matrixa=cbind(a,b,c) # these 4 lines create a 3x3 matrix
matrixa=matrix(c(1,2,3,4,5,6,7,8,9),byrow=TRUE,ncol=3) # creates the same 3x3 matrix as above
t(matrixa) # transposes matrix a
matrixab=matrixa%*%matrixb # multiplies matrixes a and b
solve(matrixa) # takes the inverse of a matrix
Data Entry Techniques
x = c( ) # enter variable x values in parentheses separate by commas.
dataset = data.frame (x,y,z) # combines vbls x, y, and z into one data set.
cbind(x, y, z) # combines x, y, and z into one dataset.
dataset = edit (data.frame( )) # opens spreadsheet for data entry.
attach(dataset) # program will recognize dataset.
x=rnorm(100) # randomly selects 100 values from normal distribution
x=rnorm(10,mean=100,sd=16) # randomly selects 10 values from a dist. with specified parameters
x=1:50 # creates a sequence of numbers from 1 to 50
attach(dataset) # Tells the program to recognize the dataset for analyses
Importing Data Files (remember to "attach" data files after importing)
*Best Approach* = Use Deducer. It will automatically import Excel, SPSS, and SAS files.
From Excel
dataset = read.table("clipboard") # copy data (include just numbers, do not include variable names). Go to R and run the command. Quick and dirty way to get data from Excel into R.
From Excel
dataset=read.table("C:/Documents and Settings/username/Desktop/dataset.csv", header=TRUE, sep=",")
1. In Excel, save dataset to desktop as a comma delimited (csv) file (found in the ‘save as type’ drop down menu)
2. Right click on desktop file and select ‘properties’ to get file location information. Copy and paste file location information into R syntax after (“
3. Change slashes to forward facing and add file name at the end followed by ”,
4. Include header=TRUE to retain variables’ names, and include sep="," to recognize comma delimited format.
From SPSS
dataset=read.table("C:/Documents and Settings/username/Desktop/dataset.csv", header=T", sep=",")
1. Save SPSS or PASW data file in the comma delimited (csv) format.
2. Right click on desktop file and select properties to get file location information. Copy and paste file location information into R syntax after (“
3. Change slashes to forward facing and add file name at the end followed by ”,
4. Include header=TRUE to retain variables’ names, and include sep="," to recognize comma delimited format.
Viewing and Editing Data
x = edit(x) # opens window to edit variable “x”
data.entry (x) # opens spreadsheet to edit variable “x”
dataset=edit(dataset) # opens spreadsheet to edit dataset. Run attach(dataset) after edits
x[1]=3 # changes first value in variable ‘x’ to the number 3.
data ( ) # lists available data sets
x # just type variable name to see a list of values in the variable “x”
x [] # put number in the parentheses to bring up datum for that position
ls ( ) # lists active variables
rm ( ) # removes/deletes variable and its data
rank ( ) # gives ranks for the data points
sort ( ) # ranks data from smallest to highest for specified variable
round (x, n) # round the elements of “x” to “n” decimal places
Basic Stats & Descriptives
mean (x)
median (x)
mode (x)
max (x)
min (x)
quantile (x)
IQR (x)
range (x) # gives lowest and highest score
sum (x) # gives sum of variable
sd (x) # unbiased standard deviation
var(x) # unbiased variance
summary (x) # summary stats
length (x) # sample size
cov (x, y) # covariance for x and y
scale(x) # to find z-scores
pnorm(scale(x)) # gives (Percentile Rank) areas to left of z-scores
pnorm(x, mean, sd) # gives PR for data points with a specified mean and sd
t = (scale(x))*10+50 # covert into t-scores
Graphics
plot (x) # creates scatter plot
plot(x,y) # creates scatter plot (outcome variable [y] listed second)
abline(lm(y~x)) # draws regression line (you must do plot(x,y) first and minimize its window)
plot(scale(x),scale(y)) # puts both variables on the same scale
barplot(x) # creates bar graph
boxplot(x) # creates boxplot
boxplot (x,y) # view both plots side by side
boxplot (y~x) # y is continuous and x is grouping factor
bxyplot(y~x|z) # y and x are continuous, z is grouping factor (requires lattice package)
stem (x)
hist (x)
lines(density(variable.name)) # superimpose line on a histogram [do hist (x) first]
hist (x,10) # a histogram with 10 breaks
table (x)
Regressions
Linear Regression
lm(y~x) # assumes that a data set with a "y" DV and "x" is already active
summary(lm(y~x)) # gives data for regression abline
abline(lm(y~x)) # draws regression line (you must do plot(x,y) first and minimize its window)
or try...
model=lm(y~x, data=datafile.name)
summary(model)
and then...
coefficients(model) #model coefficients
confint(model, level=0.95) #95% CIs for coefficients
fitted(model) # predicted values
residuals(model) # residuals
anova(model) # anova table
vcov(model) # covariance matrix for model parameters
influence(model) # regression diagnostics
Multiple Regression
summary(lm(y ~ x1 + x2 + x3)) #assumes data set is already active,
or try...
model=lm(y~x1+x2+x3, datafile.name)
summary(model)
and then...
coefficients(model) # model coefficients
confint(model, level=0.95) #95% CIs for coefficients
fitted(model) # predicted values
residuals(model) # residuals
anova(model) # anova table
vcov(model) # covariance matrix for model parameters
influence(model) # regression diagnostics
Logistic Regression (where Y is a binary factor and predictors are continuous variables)
summary(glm(y ~ x1 + x2 + x3, family=binomial)) # assumes data set is already active
or try...
model <- glm(y~x1+x2+x3,data=datasetname, family=binomial)
summary(model) # display results
and then...
confint(model) # 95% CI for the coefficients
exp(coef(model)) # exponentiated coefficients to get Odds Ratios
exp(confint(model)) # 95% CI for exponentiated coefficients (Odds Ratios)
Correlation
cor (x,y) # gives Pearson r
cor(x,y)^2 # coefficient of determination
cor(rank(x),rank(y)) # gives Spearman ranks correlation coefficient
Testing Proportions
Single Sample Proportion test
prop.test(n, N, p = null prop, conf.level=.95)
Two Samples Proportion Test
prop.test(c(n1, n2), c(N1, N2))
Parametric Tests
Z-test
z = (mean(x) – mu)/sigma/sqrt(length(x)) # one Sample z-test
Between groups t-test equal variances
t.test(x,y, var.equal=TRUE) # Add alt="two.sided", "less", or "greater" to specify alternative hypothesis
Between Groups, t-test, unequal variances (Welch’s Test)
t.test(x,y, var.equal=FALSE) # Add alt="two.sided", "less", or "greater" to specify alternative hypothesis
Within Groups t-test
t.test(x, y, paired = TRUE) # Add alt="two.sided", "less", or "greater" to specify alternative hypothesis
Independent Groups, One-Way ANOVA
dataset = stack(dataset)
df = stack(data.frame(x,y,z))
anova(lm(values ~ ind,data = df))
Factorial, Two-Way ANOVA (coming soon)
Non-Parametric Tests # Add alt="two.sided", "less", or "greater" to specify alternative hypothesis
wilcox.test(x, mu = 5) # Wilcoxon for comparing x data to a population value mu.
wilcox.test(x, y, paired = TRUE) # Wilcoxon-Signed-Ranks
wilcox.test(x,y) #Wilcoxon-Rank-Sum a.k.a Mann-Whitney-U
obs = c(n1, n2, n3, n4) # Chi-square test for goodness of fit (all 3 rows)
exp = c(prop1, prop2, prop3, prop4)
chisq.test (obs, p = exp)
chisq.test(data.frame(vbl.1, vbl.2)) # Chi-square test for independence
dataframe.name = stack(dataframe.name, df = stack(data.frame(x,y,z))
kruskal.test(values ~ ind, data = df)