On the acceptance of R
Some history and a prediction.
A discussion broke out on the R-help mailing list in January 2006 about a technical report put out by the statistical computing group at UCLA. The report in question talked mainly about SAS, SPSS and Stata. It talked briefly — and not especially positively — about R. Someone accused it of damning R with faint praise. It might not be a surprise that the R community had a somewhat higher opinion of R.
You can find that thread with a web search like:
"A comment about R" 2006
There was a mechanism of creating official comments to the technical report. I edited material from the thread, got some additional views privately, and added a few flourishes myself to produce: R Relative to Statistical Packages.
That was 7 years ago. There were on the order of 600 packages on CRAN.
Things are different now. As of St. Paddy’s Day there were 4399 packages on CRAN. If other major repositories are included, then the number of packages exceeds 6000.
It is unimaginable for something similar to happen in academia now. In particular, the UCLA site no longer has the technical report available — instead there are substantial resources about R. In academia, R is highly accepted, and — in some fields — very dominant.
In commercial companies R is in a similar position now to how it was in academia in 2006.
Which leads to my prediction: In the year 2020 R will be a dominant force in commerce similar to how it currently dominates in academia.
I’m not thinking the transition is automatic. In academia R’s competition was SAS, SPSS, Stata, Minitab and some others. In commerce R’s competition is primarily Excel.
That’s a whole different sort of competition. Those addicted to spreadsheets aren’t likely to give them up easily. But there are signs of cracks in the spreadsheet wall.
Nice post, as usual. I agree that industry competition for interactive analysis is mostly Excel. But in production processes, do you think R will gain adoption? I see it heavily used in finance, but not in production. It seems that there the competition is with Python. And Python has also a truly innovative interface (if motivated by Mathematica) in its ipython notebook.
Great post and I tend to agree with your forecast, but I don’t think that R competes with Excel in the commercial environment. My impression is that over the last decade most corporates focused on the objective to get a ‘true’ picture of their past and current performance and hence invested in more systematic data capturing processes, data warehouses and a reporting infrastructure. Indeed, Excel can be a nice front-end for that purpose. However, as corporates are keen to make the next step to look into the future and employ predictive analytics they will need a new product, which goes beyond the capabilities of Excel and R is a good choice for that.
Acceptance in the business world may take longer not so much because of spreadsheet addicts but the pervasive “OMG Open Source bad. Run Run!” attitude of most IT departments and managers. I have seen companies spending huge amounts to license Matlab or IDL rather than allow (let alone encourage) use of R, or scipy for that matter.
Nice post Patrick. There are some commercial domains where SAS, not Excel, is still the dominant competition.
Clinical trials, which is my own domain, is heavily biased towards SAS still today. That is changing for a variety of reasons, but it is an evolutionary, not revolutionary process and has not yet quite “Crossed The Chasm”. We are aggressively moving in that direction however and activities by folks both inside and outside the FDA are helping.
Also, in case folks reading this are not aware, there are now two certification/validation oriented documents that are available from the main R web site via the Documentation -> Certification links.
The first is the recently updated (December 2012) “R-FDA” document (http://www.r-project.org/doc/R-FDA.pdf), which provides guidance for the use of R in regulated clinical trials.
The second is a new document (http://www.r-project.org/doc/R-SDLC.pdf) which contains a subset of the content of the R-FDA document, for more general applications where there is a need to document R’s Software Development Life Cycle (SDLC).
Interesting post Pat.
I wonder where the weak spot for breaking “the spreadsheet barrier” might be…
Thanks all for the comments — sorry for the delay in approving them.