26 Dec

Why use the R Language?

This used to be called “An Introduction to the S Language”.  R is a dialect of the S language, and has come to be — by far — the dominant dialect.

What is R, and S?

S started as a research project at Bell Labs a few decades ago, it is a language that was developed for data analysis, statistical modeling, simulation and graphics. However, it is a general purpose language with some powerful features — it could (and does) have uses far removed from data analysis.

It should be used for many of the tasks that spreadsheets are currently used for. If a task is non-trivial to do in a spreadsheet, then almost always it would more productively (and safely) be done with R. Spreadsheet Addiction talks about problems with spreadsheets and how R is often a better tool.

Why the S Language?

  • S is not just a statistics package, it’s a language.
  • S is designed to operate the way that problems are thought about.
  • S is both flexible and powerful.

The Importance of Being a Language

Though the distinction between a package and a language is subtle, that subtle difference has a massive impact. With a package you can perform some set number of tasks — often with some options that can be varied. A language allows you to specify the performance of new tasks.

Your retort may be, “But I won’t want to create a new form of regression.” Yes, S does allow you to create new forms of regression (and many people have), but S also allows you to easily perform the same sort of standard regression on your 5 datasets (or maybe it is 500 datasets).

The key is abstraction. You easily see that your 5 regressions are really the same — there is merely different data involved with each. In your mind you have abstracted the specific tasks so that they all look similar. Once you’ve seen the abstraction, it is simple to teach R the abstraction. Languages are all about abstraction.

You can get an idea of the power of language by having a look at the R graphics gallery.

The Way We Think

One of the goals of S, and one that I think has largely been successful, is that the language should mirror the way that people think. A simple example: suppose we think that weight is a function of (dependent on) height and girth. The S formula to express this is:

weight ~ height + girth

The + is not + as in addition, but + as in “and”.

Another feature of S is that it is vector-oriented — meaning that objects are generally treated as a whole — as humans tend to think of the situation — rather than as a collection of individual numbers. Suppose that we want to change the heights from inches to centimeters. In S the command could be:

height.cm <- 2.54 * height.inches

Here height.inches is an object that contains some number — one or millions — of heights. S hides from the user that this is a series of multiplications, but acts more like we think — whatever is in inches multiply by 2.54 to get centimeters.

Experience with C or Fortran can ironically make it harder to use S efficiently. The C-before-S gang tend to translate the problem into “programming” rather than thinking about the problem in the “natural” way.

A Moveable Feast

Flexibility and power abound in S. For instance, it is easy to call C and C++ functionality from R. R does not insist that everything is done in its language, so you can mix tools — picking the best tool for each particular task.

The pieces of code that are written in the S language are always available to the user, so a minor change to the task usually requires only a minor change to the code — a change that can be carried out in a minor amount of time.

The Preferred Medium

Given its qualities, R has become the preferred computing environment for a large part of the statistical community. When a new statistical method is invented, chances are it will be implemented first in R.

In March 1999 John Chambers — one of the originators of S at Bell Labs — was presented the ACM Software System Award. It stated, “S has forever altered the way people analyze, visualize, and manipulate data.” Previous winners of this award include Unix, TeX and the World-Wide Web. John is now a member of R Core (the group that produces R).

Learn R

“Impatient R” is a minimal set of things to learn about R.

“Some hints for the R beginner” suggests additional resources to learn R.


© Copyright - Burns Statistics