Warning: Creating default object from empty value in /home3/science4/public_html/rconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/themes/optimize/functions/admin-hooks.php on line 160

In recent discussion forums like the Advanced Business Analytics, Data Mining and Predictive Modeling group on LinkedIn, there has been much discussion on the pros and cons of each technology for statistical computing, particularly with respect to their applications in business.  Below is a summary of some of the points made across this discussion:

1. Data Set Size Limitations

One of the great myths in the field is that R is limited in the size of the data sets it can analyze compared to SAS.  This is FALSE.  In fact, similar to SAS, the size of the data sets that may be analyzed by R are limited only by the physical machine.

There is an important difference related to how SAS and R natively handle data.  Simply stated, the size of data sets analyzed in SAS are generally bottlenecked by the size of the hard disk, whereas data sets analyzed out-of-the-box in R are bottlenecked by the size of the RAM – more on this in a minute.  Both are physical hardware limitations of the machine and not limitations in the software.  However, hard drive capacities have historically grown faster than support for RAM, giving R the bad rap of being limited.  Two key developments make this topic moot.  With the appearance of 64-bit operating systems that support far more RAM and with R connectivity to databases, R can be made to support any size data set that SAS can support, including those that contain billions of observations.

Our organization has used R to analyze and model datasets that contains millions of rows, scores of variables, and take up gigabytes of hard drive space without issue.

2. Open-source versus Closed

R is open-source while SAS is proprietary and closed.  The SAS marketing engine has historically argued that open-source technology should not be trusted and carries additional inherent risks.  In fact, a 2009 New York Times article by Ashlee Vance entitled, “Data Analysts Captivated by R’s Power,” quoted Anne H. Milley, director of technology product marketing at SAS as saying, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

Having used both technologies for sometime, in my opinion, Ms. Milley’s view of R is misinformed.  Much of the field agrees with my latter view.  As SAS/R expert Steve Miller said of R at the Information Management blog, “I’ve never worked with a more stable, bug-free piece of software.” Similar to others who have worked with both SAS and R, my general experience with R is that it is extremely stable.

3. Corporate Usage

It has been argued that SAS is for businesses and R is for academics.  However, the evidence tells a different story.  In fact, some of the most pioneering companies of our time use R as part of their daily operations.  Google, Facebook, and Pfizer are just a few of the names who actively publicize their use of R.

Our company is particularly knowledgeable in this area, as we’ve helped companies make the conversion from SAS to R; this movement is not unique.  Indeed, major companies across all industries are turning to R as their lead statistical computing technology.

4. Academic Usage

While SAS was once a critical component of many (if not most) graduate-level statistics programs, SAS’s position is slowly being usurped by R at many top-level institutions.  I recall my days at Cornell and the chair of the mathematics department once telling me that no student would be in want of a job if he/she knew SAS.  While that was true at the time (and still is to some extent today), R is expanding in this area as more universities structure their programs with R at their core.

5. Lag Time for Implementation of New Methods

SAS proponents have argued that it takes time to vet and test new methods before implementing them into corporate software.  In fact, the marketing engines of several proprietary analytical vendors have stated this as a point of pride and competitive advantage for their technology.

Frankly, I don’t see how lag time can be an advantage in the field of statistical computing.  Particularly in finance, healthcare and other competitive fields, I’d much rather have a vehicle to access both the latest and the tried-and-true techniques than be left in the dust to inhale my competitor’s fumes.  This is probably why so many companies are switching or considering a conversion to R in the near future.

6. Cost

In being open-source, R is completely free.  Unlike those who license proprietary statistical technologies, the ability to use work won’t be lost if you don’t pay your annual renewal fees.  That’s because R doesn’t have any fees.  Shifting to R is perhaps one of the greatest risk-mitigation strategies a business could take in this arena, as once you have a copy of R, it is yours to keep for all eternity.  R is not going away.

7. Documentation

Both SAS and R are well documented.  SAS arguably has more documentation since it has been around longer.  However, a search of Amazon.com will reveal more R books than one would likely want to read.

There are many additional topics that could be discussed when doing a comparison between R and SAS.  However, this summary has highlighted some of the areas where there is conflict between reality and marketing spin.  Further discussion on this and similar topics are to follow.

Leave a Reply

+ two = 4