How can I get to speed with R for processing statistical data?
What is it?
R is a computer language and environment for statistical computing and graphics. It is widely used in biostatistics, including microarray and proteomics data analysis.
R provides a wide variety of statistical tests:
- linear and nonlinear modelling
- classical statistical tests
- time-series analysis
- classification
- clustering
R also provides
graphical techniques, although native R is not strong on graphics
[example 1]
[example 2], and is highly extensible, meaning you write you can modify existing code or call complete new functions.
However, R is not for everyone. If you have only basic statistical needs, or are not at least somewhat familiar with programming, R is not for you. Perhaps a program such as GraphPad's
InStat or Excel's
statistical functions might be preferable.
S+, a commercial equivalent of R, is by far the better solution if you require significant graphing capabilities; this language is essentially identical to R and requires purchase of a moderately priced license.
What is it for?
Here are selected examples applicable to life sciences; details are available
here:
- Microarray data analysis, particularly using BioConductor
- Bayesian Inference
- Cluster Analysis & Finite Mixture Models
- Analysis of ecological and environmental data
- Statistical Genetics
- Machine Learning & Statistical Learning
- Multivariate Statistics
- Spatial Analysis of Spatial Data
- Graphical models in R
Obtaining R
R is free and can be obtained
here. It compiles and runs on:
IMPORTANT: If you don't know what "compiling" means, R may not be suited to you:
- If you know some programming, you may want to want to consult Appendix E of Using R for Introductory Statistics (Verzani 2004), available from Lane. It will provide you with the essentials of the language in very concise form.
- If you don't know at least some programming, the learning curve for R will likely be very significant, and it might be preferable to use another tool until you are comfortable with at least one programming language.
You can also use R on computers in the
M202 Computer Laboratory: all PCs in the M202 Teaching Lab have R already installed, usable by anyone with a
SUNetID.
R training
- Hung Chen's succinct overview of R programming
- David Metz and Brad Hunting's excellent R tutorial:
Other selected training documents:
Key references
Source
Lane Librarian
Record created 9/21/2006; updated 1/12/2007.
ypouliot, September 16, 2009