Interview with a forced convert from Matlab to R

17 Apr
2013/04/17

Here is an interview with Ron Hochreiter, Assistant Professor at WU Vienna University Economics and Business.

bioma1a

In 25 words or less tell us what you do (using German words is cheating).

I consider myself as a data scientist (teaching and research) with roots in Mathematical Programming, i.e. Optimization under Uncertainty (Stochastic Programming).

You were an unwilling convert from Matlab to R.  What were the circumstances?

I started using MatLab in 1998 during my undergraduate studies in Business Informatics and have written huge parts of my Masters thesis (2000-2001) as well as my PhD thesis (2001-2005) with it. I was very comfortable with MatLab. Although I did use R for some statistical analysis (mostly cluster analysis), I never completed any larger projects with it. With MatLab I implemented a vast amount of tools and rather complete optimization models over the years and even produced some usable software for the industry, including a risk management system for an Austrian energy trading company and a large-scale simulation tool for an Austrian Pension Fund. Then I switched university and it turned out that my new boss is a member of the R Core Group – so that was the end of my MatLab career.

How hard was the transition?

Although I was not unwilling to change to R, it was pretty hard for me at the beginning, because I was used to doing everything in MatLab. One main motivator was the fact that the MatLab licensing issues were really bugging me – especially the fact that the student version is a trimmed-down version. Anyway, one issue that really helped to speed up the transition was the availability of RStudio, which finally took the nerdy feeling out of R – at least for me. To love a high-level scientific programming language I need to be able to sell it to other people, especially to non-programmers and non-nerds, which translates to my business school Bachelor students – and I was not able to do this before the advent of RStudio. Now I am able to do so and thus am really glad that I took all the efforts to switch to R.

What are the best resources to aid the transition?

For me, one resource I highly recommend is Quick-R at http://www.statmethods.net/ At the beginning of my personal transition from MatLab, I also googled for “MatLab R syntax” and used some of the documents, e.g. http://mathesaurus.sourceforge.net/octave-r.html to check out syntax equivalents. I probably do not need to point out your excellent material, because if people found this interview, they most likely found your books too!

[Editor's note: if not, then you can have a look at some of the favorite pages.]

What is your general attitude about R versus Matlab now?

I really love R nowadays and personally could not imagine switching back to MatLab anymore – although out of curiosity I am following the Julia Language project at http://www.julialang.org/ so in case I would ever need to switch back to MatLab, I would do so indirectly via Julia.

What are use cases of Matlab over R, and vice versa?

I’d say that certain (if not most) optimization tasks are still not convincingly implemented in R at all. It is sad to see that people who quit MatLab for an open source alternative are choosing Python over R – which I do understand, because most optimization packages in R do not make any sense if you look at them from the Mathematical Programming viewpoint. Statisticians certainly do have a different approach to optimization.

What are the three most useful things in R but not in Matlab?

Well, it is quite difficult to name just three. Probably report generation using knitr/Sweave, Shiny, and the general capabilities to produce graphs in R might be those which come to my mind first.

What are the three most useful things in Matlab but not in R?

For me it is the availability of the two great optimization modeling frameworks CVX and YalMip. And if you are from the field of engineering and need to do simulation tasks for which MatLab/Simulink does provide exactly what you need, then it is the right choice… but if you are doing this kind of stuff then you most likely never considered to switch to R anyways.

You have some material on the web. What and where is it?

Unfortunately I still did not have time to polish and publish my R-related software – I was just able to register some domains for my projects, and a short summary is available at http://www.hochreiter.net/R/

 

ron2

Tags:
5 replies
  1. Dimitri Shvorob says:

    I switched to R voluntarily, to be able to user data frames – these showed up in Matlab only a few years ago, and even so, not in Matlab proper, but in the (not-free) Statistics Toolbox add-on. I recall with horror having to code up a table-join function in Matlab – contrast to “merge” in R, not to mention the wonderful “sqldf” package. If the focus is on data handling, R wins hands down.

    Reply
  2. Jeff says:

    have you tried the package nloptr (http://cran.r-project.org/web/packages/nloptr/index.html) for optimisation? I have used nlopt in my C++ code and was quite happy with it.

    Reply
  3. Patrick Burns says:

    Here is a response from my friend Louis Scott of http://kiemaadvisors.com

    I googled languages programmers hate (R and matlab seperately)

    For Matlab, there are clearly two types of people in the world, those that hate 1 indexing and those that hate zero indexing. Never the twain shall mix. Zero indexers have been burned by C errors such as accessing the nth entry of an array which is really at n-1. Ones indexers by virtue of their 4GL language never see this problem as they can see the results interactively – this is not a bother to them. Both Matlab and Julia are ones indexing languages.

    As for R:

    http://www.talyarkoni.org/blog/2012/06/08/r-the-master-troll-of-statistical-languages/

    R is a domain specific language for statisticians that just works. For others, it feels not right – I mean like this guy, who expects that there is (as in Python) one right way to do things within a language.

    Every language is a trade off, making the most common problems in a domain part of the way of thinking within the language, and at the expense of making choices that leave other concerns out. For instance, speed or interactive use.

    If you want the latter, your language is dynamically typed, the interpreter figures out what kind of thing you created and runs. Static languages require that you specify integer or float 32 when you create arrays and the compiler (not interpreter) goes off and optimizes the grammer tree created by your code. That takes time, but makes for the fastest code.

    Matlab (and Julia) try to give both speed and interactive use – they come to the table as the prototyping languages for high performance computing. They are meant to take big matrices and crunch away. They too are domain specific not general purpose languages, but the set of people interested is wider – pretty much any engineer has used or still uses it. Matlab relies on some clever Just in Time compilation (JIT) to get as fast as possible once the interpreter knows all the data types you created dynamically.

    For instance, here is how to solve for 1 the hard way.
    A = rand(4000,4000);
    tic, det(A*inv(A)), toc

    The answer is 1 and it takes about 10 seconds in Octave (a matlab clone.) To do this, it uses the ATLAS matrix library and breaks up the work across CPU cores.
    http://math-atlas.sourceforge.net/

    Interestingly, Julia has its antecedents in a parallel version of Matlab called Star-P Matlab by Viral Shah and Alan Edelemen of MIT. Again the emphasis is on number crunching.
    http://www.ll.mit.edu/HPEC/agendas/proc04/abstracts/choy_ron.pdf
    http://julialang.org/

    I guess I should give a nod to Python as well, and PANDAS by Wes McKinney which provides dataframe-like structures and methods on these.
    http://pandas.pydata.org/

    In diplomatic terms, there isn’t a right language, everyone makes a choice as to what aspects are most important to them. One concern I do have about R is that the desire to have everything in a single environment, and I think R is a research language not a production language. Again, some languages make this easier than others, three aspects of a production environment, are Unit testing, a sensible batch process and a debugger.

    For me, the lack of unit testing on the packages is a big concern. I know for example the stuff that makes it into Matlab, or Octave the open source clone has had unit tests applied to everything. With R, many packages are released by individual statisticians who found it useful in their own work and then released it to others. Great, but the quality is quite variable, and in particular, although unit test packages are available, their use is likely not so broad.

    Reply
    • Patrick Burns says:

      The blog post that Louis found on R is quite interesting I think. The sort of problems it mentions is not why The R Inferno exists, but it is why it is thicker than I would have hoped.

      I think Louis’ point about testing is very important. There is a definite difference between core R — which I find to be very good quality — and the typical package from CRAN. It would be great if there were better testing for most packages. It would be wonderful if there were a system that made it more apparent to the typical R user how rigorously each package is tested.

      Reply

Trackbacks & Pingbacks

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

© Copyright - Burns Statistics