
As I start learning R it is normal to start from the very beginning, i.e. take a look at history of R programming language. This post is based on respective lecture from course I’m currently taking. So what is R? R is a dialect of S.
What is S?
S is a language that was developed by John Chambers (Wikipedia entry, CV at Stanford University web site) and others at now defunct Bell Labs. There is a great book about Bell Labs history – “The Idea Factory: Bell Labs and the Great Age of American Innovation” by Jon Gertner, which tells a story of one of the first innovation factories.
S was initiated in 1976 as an internal statistical analysis environment (for use within Bell Labs) – originally implemented as Fortran libraries (Fortran libraries were used to repeat statistical routines). Fortran is a programming language born in 1956.
Early versions of the language did not contain functions for statistical modeling (that didn’t come until v3 of the language roughly).
In 1988 the system was rewritten in C (to make it more portable across the systems) and began to resemble the system that we have today.
There is a seminal book “Statistical Models in S” by Chambers and Hastie (aka the white book) which documents the statistical analysis functionality.
Version 4 of the S language was released in 1998 and is the version used today. The book “Programming with Data” by John Chambers (aka the green book) documents this version of the language.
So R is an implementation of the S language originally developed in Bell Labs.
Some more historical notes:
In 1993 Bell Labs gave StatSci (now Insightful Corp.) an exclusive license to develop and sell the S language.
In 2004 Insightful purchased the S language from Lucent (this is what Bell Labs become) for $2 million and is the current owner.
In 2006, Alcatel purchased Lucent Technologies and is now called Alcatel-Lucent.
Insightful sells its implementation of the S language under the product name S-PLUS and has built a number of fancy features (GUIs, mostly) on top of it – hence the “PLUS” in its name.
In 2008 insightful is acquired by TIBCO for $25 million. TIBCO is still develops S-PLUS. The fundamentals of the S language itself has not changed dramatically since 1998. In 1998, S won the Association for Computing Machinery’s Software System Award. S Philosophy: “Promote transition from user to programmer”
In “Stages in the Evolution of S”, John Chambers writes:
“We wanted users to be able to begin in an interactive environment, where they did not consciously think of themselves as programming. Then as their needs become clearer and their sophistication increased, they should be able to slide gradually into programming, when the language and system aspects would become more important.”
http://www.stat.bell-labs.com/S/history.html
Back to R
1991: Created in NZ by Ross Ihaka & Robert Gentleman. Their experience developing R is documented in a 1996 JCGS (Journal of Computation and Graphical Statistics) paper.
1993: First announcement of R to the public
1995: Martin Machler convinces Ross and Robert to use the GNU General Public License to make R free software
1996: A public mailing list is created (R-help and R-devel)
1997: The R Core Group is formed (contained some people associated with S-PLUS). The Core Group controls the source code for R (they make changes in primary R source code).
2000: R version 1.0.0 is released
2013: R version 3.0.2 is released on December 2013
Features of R
– Syntax is similar to S, making it easy for S-PLUS users to switch over
– Semantic are superficially similar to S, but in reality are quite different
– Runs on almost any standard computing platform/OS (even on the PlayStation 3)
– Frequent releases (annual + bugfix releases); active development
– Quite lean, as fas as software goes; functionality is divided into modular packages
– Graphics capabilities very sophisticated and better than most general purpose statistical packages
– Useful for interactive work, but contains a powerful programming language for developing new tools (user -> programmer)
– Very active and vibrant user community; R-help and R-devel mailing lists and Stack Overflow
– It’s free! (Both in the sense of beer and in the sense of speech)
On Free Software
With free software, you are granted:
- The freedom to run the program, for any purpose (freedom 0)
- The freedom to study how the program works, and adapt it to your needs (freedom 1). Access to the source code is precondition for this.
- The freedom to redistribute copies so you can help your neighbor (freedom 2).
- The freedom to improve the program, and release your improvements to the public, so that the whole community benefits (freedom 3). Access to the source code is a precondition for this.
Drawbacks of R
– Essentially base on 40 year old technology (S, other drawbacks are results of it)
– Little built in support for dynamic or 3-D graphics (but things have improved greatly since the “old days”)
– Functionality is based on consumer demand and user contributions. If no one feels like implementing your favorite method, then it’s your job! (or you need to pay somebody to do it)
– Objects must generally be stored in physical memory; but there have been advancements to deal with this too (both in R and on hardware side with cheaper memory). * This can be a limitation for big data era.
– Not ideal for all possible situations (but this is a drawback of all software packages)
Design of the R system
The R system is divided into 2 conceptual parts:
1. The “base” R system you download from CRAN
2. Everything else
R functionality is divided into a number of packages:
– The “base” R system contains, among other things, the base package which is required to run R and contains the most important functions.
– The other packages contained in the “base” system include utils, stats, datasets, graphics, grDevices, grid, methods, parallel, compiler, splines, tcltk, stats4.
– “Recommend” packages: boot, class, cluster, codetools, foreign, KernSmooth, lattice, mgcv, nlme, rpart, survival, MASS, spatial, nnet, Matrix.
And there are many other packages available:
– There are about 4000 packages on CRAN that have been developed by users and programmers around the world.
– There are also many packages associated with the Bioconductor project (http://bioconductor.org) which is project of implementing R software for genomic and biological analysis.
– People often make packages available on their personal websites; there is no reliable way to keep track of how many packages are available in this fashion.
Some R Resources
Available from CRAN (http://cran.r-project.org)
– An Introduction to R
– Writing R Extensions
– R Data Import/Export
– R Installation and Administration (mostly for building R from sources)
– R Internals (not for the faint of heart)
Some Useful Books on S/R
Standard texts
– Chambers (2008). Software for Data Analysis, Springer.
– Chambers (1998). Programming with Data, Springer.
– Venables & Ripley (2002). Modern Applied Statistics with S, Springer.
– Venables & Ripley (2000). S programming, Springer.
– Pinheiro & Bates (2000). Mixed-Effects Models in S and S-PLUS, Springer.
– Murell (2005). R Graphics, Chapman & Hall/CRC Press.
Other resources
– Springer has a series of books called Use R!
– A longer list of books is at http://www.r-rpoject.org/doc/bib/R-books.html