How can I get to speed with R for processing statistical data?
What is it?
R is a computer language and environment for statistical computing and graphics. It is widely used in biostatistics, including microarray and proteomics data analysis.R provides a wide variety of statistical tests:
- linear and nonlinear modelling
- classical statistical tests
- time-series analysis
- classification
- clustering


However, R is not for everyone. If you have only basic statistical needs, or are not at least somewhat familiar with programming, R is not for you. Perhaps a program such as GraphPad's InStator Excel's statistical functions might be preferable.
S+, a commercial equivalent of R, is by far the better solution if you require significant graphing capabilities; this language is essentially identical to R and requires purchase of a moderately priced license.
What is it for?
Here are selected examples applicable to life sciences; details are available here:- Microarray data analysis, particularly using BioConductor
- Bayesian Inference
- Cluster Analysis & Finite Mixture Models
- Analysis of ecological and environmental data
- Statistical Genetics
- Machine Learning & Statistical Learning
- Multivariate Statistics
- Spatial Analysis of Spatial Data
- Graphical models in R
Obtaining R
R is free and can be obtained here. It compiles and runs on:- UNIX
- FreeBSD
- Linux
- Windows
- MacOS

- If you know some programming, you may want to want to consult Appendix E of Using R for Introductory Statistics (Verzani 2004), available from Lane. It will provide you with the essentials of the language in very concise form.
- If you don't know at least some programming, the learning curve for R will likely be very significant, and it might be preferable to use another tool until you are comfortable with at least one programming language.
R training
- Hung Chen's succinct overview of R programming
- David Metz and Brad Hunting's excellent R tutorial:
- Part 1: Dabbling with a wealth of statistical facilities
- Part 2: Functional programming and data exploration
Other selected training documents:- Introduction to R
- R Data Import and Export
- Installing R
- Complete list of manuals
Key references
- BOOK: Verzani, J. (2004) Using R for Introductory Statistics .
- BOOK: Dalgaard, P. (2002) Introductory Statistics with R , Springer; good, brief overview of R's features applied to actual examples.
- eBook/BOOK: Bioinformatics and computational biology solutions using R and Bioconductor ,Gentleman R, (2005), available at the Lane Library.
- BOOK: Spector, P (1994) An introduction to S and S-plus ; excellent overview of the S language, essentially identical to R. Available from the Mathematical & Computer Sciences Library.
- Comprehensive R Archive Network:source code and executables.
- R home site.
- R FAQs.
- Search R documentation.
- All of Dr. Balise's videorecorded statistics classes
- Dr. Balise's top statistics books.
Source
Lane LibrarianRecord created 9/21/2006; updated 1/12/2007.