Lane Medical Library

How can I get to speed with R for processing statistical data?

 

What is it?

R is a computer language and environment for statistical computing and graphics. It is widely used in biostatistics, including microarray and proteomics data analysis.

R provides a wide variety of statistical tests:
  • linear and nonlinear modelling
  • classical statistical tests
  • time-series analysis
  • classification
  • clustering
R also provides graphical techniques, although native R is not strong on graphics, and is highly extensible, meaning you write you can modify existing code or call complete new functions. coplotfrestplot1

However, R is not for everyone. If you have only basic statistical needs, or are not at least somewhat familiar with programming, R is not for you. Perhaps a program such as GraphPad's InStator Excel's statistical functions might be preferable.

S+, a commercial equivalent of R, is by far the better solution if you require significant graphing capabilities; this language is essentially identical to R and requires purchase of a moderately priced license.

What is it for?

Here are selected examples applicable to life sciences; details are available here:
  • Microarray data analysis, particularly using BioConductor
  • Bayesian Inference
  • Cluster Analysis & Finite Mixture Models
  • Analysis of ecological and environmental data
  • Statistical Genetics
  • Machine Learning & Statistical Learning
  • Multivariate Statistics
  • Spatial Analysis of Spatial Data
  • Graphical models in R

Obtaining R

R is free and can be obtained here. It compiles and runs on:
  • UNIX
  • FreeBSD
  • Linux
  • Windows
  • MacOS
    Example desktop
IMPORTANT: If you don't know what "compiling" means, R may not be suited to you:
  • If you know some programming, you may want to want to consult Appendix E of Using R for Introductory Statistics (Verzani 2004), available from Lane. It will provide you with the essentials of the language in very concise form.
  • If you don't know at least some programming, the learning curve for R will likely be very significant, and it might be preferable to use another tool until you are comfortable with at least one programming language.

R training

  1. Hung Chen's succinct overview of R programming
  2. David Metz and Brad Hunting's excellent R tutorial:
    Other selected training documents:

Key references

Source

Lane Librarian

Record created 9/21/2006; updated 1/12/2007.
  • Today's hours: 8 am – 10 pm
  • Hours
School of Medicine