Archived Class

July
2015

R Programming: Part 2

No experience with R or programming will be assumed, although some exposure to programming concepts may be helpful.

Part 2 will have 3 parts:

  • First, we will look at how R can be used as a programming language
  • Second, we will look at basic string manipulation and text processing, which is needed for such varied tasks as data scraping and manipulating genome data
  • Finally, we will tie everything together by using what we've learned to answer realistic, useful questions
  • Programming concepts
    • Flow control -- branching and iteration
    • user-defined functions
    • vectorized operations
  • Text processing
  • Basic string manipulation -- nchar, paste, print, cat, strsplit
  • More complicated tools -- grep, sub/gsub, substr
  • Investigate frequency of a given name over time
  • Investigate frequency of given names ending in "a" for boys vs. girls
  • Investigate relationship between word length and word count
  • Applications: perform specific tasks involving the following tools: 1) getting data into R, 2) manipulating data structures, 3) vectorized operations (or loops) 4) text processing, 5) plotting

If you would like to follow along on your own computer, please have R installed.

Download R installation instructions for Apple, PC, or Linux

References

  • The Quick-R: a good mix of concise tutorial pages and reference material
  • "Modern Applied Statistics with S-Plus" by Venables and Ripley:  an authoritative but introductory textbook. Gives some background on statistical methods, besides showing how to implement them in R
  • The official R mailing list, which has searchable archives covering many years of questions. If you have an R question, almost certainly someone has asked it on the list. Nowadays, mainly due to stackexchange, I would simply use Google, which ranks stackexchange sites and the R mailing list highly. An added benefit of stackexchange is that oftentimes questions about R are closely related to questions about data analysis or statistics, and these are all appropriate topics for stackexchange.

Stanford Medicine

CONTACT US