Warning: Creating default object from empty value in /home3/science4/public_html/rconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/themes/optimize/functions/admin-hooks.php on line 160

R Usage Exceeds SAS, SPSS in Recent Kaggle Survey

In an article published this November, a recent survey by Kaggle uncovered that R is the most frequently listed tool in user profiles for Kaggle predictive modeling competitions.  Of the 1,714 users that participated in the survey, R was listed as a tool used by 32% of users surveyed.  Matlab finished a distance second at 13%.

Of the Kaggle users surveyed, inclusion of R in their list of tools was over 350% greater than inclusion of SAS.   Kaggle also reported that 50% of competition winners used R in their analysis.  Below are the survey results as published by Kaggle:

The above graph and the original story may be found at: http://blog.kaggle.com


SAS and R – Is SAS frightened of R?

In a January 2009 article from the NYTimes, the director for technology product marketing at SAS was quoted as saying of R, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

Unfortunately for this marketing director, I’m afraid that freeware is already being used to build aircraft engines.  I’m curious if this SAS director ever used Google Maps to get arrive safely at a destination?  I’m wondering if she ever used the internet, where, the Apache web server dominates the internet with over 60% market share as of 2011 (per Netcraft.com).  The closest proprietary competitor in the web server space is Microsoft, which sports a 20% market share.

This list of industries where freeware has overtaken commercial competitors is lengthy and beyond the scope of this article.  However, the above quote should stimulate thought within organizations about to purchase statistical computing software.  Why would a leading commercial statistical software vendor bring up price as a weakness of R (it’s free) rather than the technical merits of the technology?  Furthermore, since when does a product being open-source suddenly make it bad?  Don’t we as readers need to know more?

Having done extensive work with both SAS and R, we are in a unique position to say something about the field of commercial statistical software and how it compares to the open-source competition, namely R. Put bluntly, we believe that commercial statistical computing software languages may be in trouble… but with some caveats.  With worthy, free competitors entering the marketplace such as R, the reasons why companies would want to pay for use of an analytical programming language have been diminished.  If the open-source technology was less capable or less reliable, then this story would be different.  However, our company has found that R works equally as well (if not better) in cases where we would have historically used SAS and/or another commercial product.  For example, we’ve had the opportunity to develop and re-develop a marketing model for our clients – both in SAS and R.  Both the development time and outcome of using both SAS and R were about the same, give or take a bit.  However, R was free to use.  Additionally, due to licensing restrictions with the proprietary solution, we were able to get our R clients moving faster into exposing their models to the web.  Furthermore, we were able to get more users working with the R model since we could freely install it on as many machines as we pleased.  This would not be possible with commercial offerings without paying a price.  Cheaper and faster with the same outcome is a good deal in our industry.

So, is there any hope for commercial statistical software?

We believe the answer is yes, but will require that these vendors adapt to changing times.  Commercial statistical software vendors, SAS in particular, are in a unique position having helped clients use their software to address business challenges for more than 30 years.  People don’t buy statistical software as the finished product.  Instead, people buy statistical software to develop solutions that address problems.  Who has better knowledge about the field of statistical solutions than the vendors who created the underlying technology?  No one – this is where commercial statistical computing vendors should focus and indeed where the market is headed.

We see analytical vendors who focus on solutions as being in a very good position going forward.  However, there is one caveat.  Focus will be key here.  But what do we mean by focus?  Vendors who offer thousands of different products to disparate markets will likely fall behind.  Most customers are not looking for vendors that can be everything to everyone.  Most customers are looking for the vendor that best understands and can solve their specific problem.  Vendors need to show discipline, restraint, and sharp resolution in their product offering.  Apple Inc has mastered this field.  While Apple has many other traits that enabled it to become one of the most successful technology companies ever to exist, Steve Jobs always had the restraint to only focus on a small set of problems that his company could become best at solving.

In summary, we see R and open-source technologies becoming the standard for statistical computing over the next ten years.  However, we believe that this shift has created new opportunities for proprietary statistical vendors, and that these vendors are best-positioned to embrace these opportunities – namely in the solutions market.   To be successful, these opportunities will likely require strong discipline and focus to ensure customers are being delivered crisp, sharp solutions that attack their problems head-on.


5 Reasons to Worry if You’re Using Proprietary Statistical Computing Software

Lately, the marketing engines of several large statistical software companies have been hard at work trying to spread fear and desperately convince businesses why they should not switch to R – a free, open-source statistical computing technology that is taking the financial, healthcare, insurance, and other industries by storm.

Given our experience with both proprietary technologies like SAS, and the open-source competitor R, we wanted to unveil the other side of the coin that the major software firms don’t want people to see.  Below are 5 reasons to be concerned if your company is using licensed data and statistical computing software.

1. If you miss a renewal bill payment, all your work may… stop working.

Picture spending 2, 5, or 10 years developing on a language that is licensed annually.  Now imagine one day, you log into your workstation, only to find that all of the programs you have written suddenly stopped working!  This is a very true story and a risk companies face everyday when paying to license a software language. What happens if one day, your company runs into some troubled times and cannot afford to pay tens of thousands (perhaps hundreds of thousands) of dollars to renew a license for the software.  You are pretty much stuck.

2. Unexpected Licensing Cost Jumps

When a company develops using a licensed statistical computing technology, particularly one that requires annual renewal fees for continued use, the company has forced itself into a position where it needs to continue to pay these licensing fees in order to continue using what it built.  I’ve seen many cases where paying these fees becomes a lifeline for a company.  Particularly in cases where there has been vast development on top of a licensed language, the company’s only way of surviving may be to pay these fees.  What would happen if this “survival fee” were to jump 10%, 20% or 60% in a given year (the latter is what Netflix did to its customers according to Bloomberg)?  Can you afford to risk your existence on uncertain, increasing future licensing costs?

On this point, I know some may argue that this happens all the time in business; for example, a company may license a database system.  However, there is a big difference here.  In cases of database platforms, servers, routers, word processing programs, and other technologies, the technology is not serving as a critical building block.  Sure there is the cost of setup and training.  However, it would not be inconceivable for a company to replace one of these resources.  This is not the same as a statistical computing platform.  Once an organization begins developing in a particular language, it becomes a part of their DNA.  Everything that gets built and all the intellectual property that is brought into existence is now contingent of paying licensing fees.  Replacing an arm or a leg may be doable, but replacing a company’s DNA would be a substantial challenge.

3. Mergers and Acquisitions

In the field of data technologies, vendors are changing all the time, and the seniority of the vendor is little assurance of its future stability.  For example, SPSS, a 40-year-old-plus statistical computing vendor whose software was initially released in 1968, was acquired in 2007 by IBM.   While this may not seem like a big deal, such changes can be particularly worrisome in the field of statistical computing.  If you own a car and the auto manufacturer was to be acquired, perhaps the greatest question a current owner might face is whether his warranty would still be honored.  However, the implications of such a merger in statistical computing are far more troubling.  If you develop in a particular computing language that requires licensing, there is no telling what level of support a new owner for your recently-acquired vendor might provide you – if any.  It’s like building a house on a piece of land that is owned and traded by someone else.  The worst part is that you have no say in how the land is traded!

4. Bankruptcies

Major corporations that have shaped the world we live in are going out of business.  Boston Scientific and Eastman Kodak are just two examples of technology companies that are in risk of bankruptcy in the near future according to the Business Insider.   What would happen if the vendor of our statistical computing software that is at the root of our company development were to find themselves in financially-troubling times?  Is there any guarantee that the engine that drives our company will still be able to run tomorrow?

5. Limits Business Expansion and Scalability

A major challenge I’ve encountered with clients who use proprietary statistical computing software is difficulty they encounter when they want to put their developed products or services online and/or connect them with other technologies.  While the technology may be available to make this expansion possible, expect to pay an arm and a leg for it (possibly taken from item 2 above).  Particularly hard-hit are companies that wish to get paid to crunch other people’s data.  Generally, this capability falls under special licensing provisions that may be so pricey as to make any CEO/CTO keel over.  I’d estimate that 30% of the clients I’ve worked with have run into this issue, causing them to either abandon their expansion plans or to use a different technology (such as R) to make it possible.

In summary, the next time you hear a statistical computing vendor talk down about the risks of open-source technologies like R,  be sure to consider what the vendor is not telling you.


R versus SAS – A Summary List

In recent discussion forums like the Advanced Business Analytics, Data Mining and Predictive Modeling group on LinkedIn, there has been much discussion on the pros and cons of each technology for statistical computing, particularly with respect to their applications in business.  Below is a summary of some of the points made across this discussion:

1. Data Set Size Limitations

One of the great myths in the field is that R is limited in the size of the data sets it can analyze compared to SAS.  This is FALSE.  In fact, similar to SAS, the size of the data sets that may be analyzed by R are limited only by the physical machine.

There is an important difference related to how SAS and R natively handle data.  Simply stated, the size of data sets analyzed in SAS are generally bottlenecked by the size of the hard disk, whereas data sets analyzed out-of-the-box in R are bottlenecked by the size of the RAM – more on this in a minute.  Both are physical hardware limitations of the machine and not limitations in the software.  However, hard drive capacities have historically grown faster than support for RAM, giving R the bad rap of being limited.  Two key developments make this topic moot.  With the appearance of 64-bit operating systems that support far more RAM and with R connectivity to databases, R can be made to support any size data set that SAS can support, including those that contain billions of observations.

Our organization has used R to analyze and model datasets that contains millions of rows, scores of variables, and take up gigabytes of hard drive space without issue.

2. Open-source versus Closed

R is open-source while SAS is proprietary and closed.  The SAS marketing engine has historically argued that open-source technology should not be trusted and carries additional inherent risks.  In fact, a 2009 New York Times article by Ashlee Vance entitled, “Data Analysts Captivated by R’s Power,” quoted Anne H. Milley, director of technology product marketing at SAS as saying, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

Having used both technologies for sometime, in my opinion, Ms. Milley’s view of R is misinformed.  Much of the field agrees with my latter view.  As SAS/R expert Steve Miller said of R at the Information Management blog, “I’ve never worked with a more stable, bug-free piece of software.” Similar to others who have worked with both SAS and R, my general experience with R is that it is extremely stable.

3. Corporate Usage

It has been argued that SAS is for businesses and R is for academics.  However, the evidence tells a different story.  In fact, some of the most pioneering companies of our time use R as part of their daily operations.  Google, Facebook, and Pfizer are just a few of the names who actively publicize their use of R.

Our company is particularly knowledgeable in this area, as we’ve helped companies make the conversion from SAS to R; this movement is not unique.  Indeed, major companies across all industries are turning to R as their lead statistical computing technology.

4. Academic Usage

While SAS was once a critical component of many (if not most) graduate-level statistics programs, SAS’s position is slowly being usurped by R at many top-level institutions.  I recall my days at Cornell and the chair of the mathematics department once telling me that no student would be in want of a job if he/she knew SAS.  While that was true at the time (and still is to some extent today), R is expanding in this area as more universities structure their programs with R at their core.

5. Lag Time for Implementation of New Methods

SAS proponents have argued that it takes time to vet and test new methods before implementing them into corporate software.  In fact, the marketing engines of several proprietary analytical vendors have stated this as a point of pride and competitive advantage for their technology.

Frankly, I don’t see how lag time can be an advantage in the field of statistical computing.  Particularly in finance, healthcare and other competitive fields, I’d much rather have a vehicle to access both the latest and the tried-and-true techniques than be left in the dust to inhale my competitor’s fumes.  This is probably why so many companies are switching or considering a conversion to R in the near future.

6. Cost

In being open-source, R is completely free.  Unlike those who license proprietary statistical technologies, the ability to use work won’t be lost if you don’t pay your annual renewal fees.  That’s because R doesn’t have any fees.  Shifting to R is perhaps one of the greatest risk-mitigation strategies a business could take in this arena, as once you have a copy of R, it is yours to keep for all eternity.  R is not going away.

7. Documentation

Both SAS and R are well documented.  SAS arguably has more documentation since it has been around longer.  However, a search of Amazon.com will reveal more R books than one would likely want to read.

There are many additional topics that could be discussed when doing a comparison between R and SAS.  However, this summary has highlighted some of the areas where there is conflict between reality and marketing spin.  Further discussion on this and similar topics are to follow.


Should I Switch to R?

The answer to the question, “Should I Switch to R,” will likely be different for various users & businesses.  While many companies may feel comfortable with their current statistical and data computing software, there is strong motivation for others to change.  To determine if one should switch, listing out pros and cons, then calculating a Return on Investment (ROI) and its associated timeframe can be helpful.  Below are some thoughts about what may be considered:


  1. No more licensing costs
  2. Large user community
  3. Powerful database integration
  4. Ease of use
  5. Integration with the web and other technologies


  1. Size/complexity of current environment.  Large / complex = more expensive
  2. Employee skills – will new staff be needed?
  3. Conversion know-how / time.  Do we have the skills to switch, and how long will it take?

Our experts can guide companies through each of the concerns that may impact a decision of whether to switch.  Feel free to contact us if you would like some guidance.



10 Reasons to Switch to R (from SAS, SPSS, and other data software)

In our experience converting businesses from SAS to R (and SPSS to R), we have compiled a list of common reasons why customers make the switch to R.  With R gaining such attention in mainstream media and international business,  more companies are questioning whether converting to R would make sense for them.

Here are some of the reasons why our customers have converted to R:

  1. R integrates with a large number of database systems.
  2. R supports parallel processing.
  3. R can support huge data volumes – gigabytes of data and 100′s of millions of records.
  4. R implements new methods faster than any other data technology.
  5. R is cross-platform.  Windows, Unix/Linux, Mac, etc..
  6. R has a huge and active community of smart people.
  7. R has a powerful and consistent base language – if you dream it, there is likely a way to do it in R
  8. R is well-documented (look at Amazon.com!)
  9. R integrates well with other languages and web platforms.
  10. Best of all, R is free. Never worry about losing your work for not paying a bill.

For more information on converting to R, visit Rconvert.com.



SAS and R – A love/hate relationship

Once upon a time, I was the manager of the SAS department for a leading SAS partner in New England. As a SAS aficionado who’s work with the language dated back to my days in college, I truly valued the power that SAS brought to the table, whether by providing data management, statistics, predictive modeling, or business intelligence capabilities. My expertise was centered around using SAS technology to predict customer behavior for a variety of industries – insurance, marketing, and healthcare in particular. SAS is a fantastic tool for data mining, predicting, and forecasting in these areas.

While the technology and its application were solid, I began realizing that lots of small and mid-size companies were having difficulty footing the steep SAS licensing bills. One of the major disadvantages of using SAS was the annual licensing fees. Anyone in the field knows that these fees are not trivial. While large corporations like a Bank of America might be able to pay these costs without issue, the small to mid-size businesses were getting left out. Entrepreneurship, in particular, was being stifled by these overbearing fees. Even more scary, if a company begins developing in proprietary languages licensed annually, then hits into rough times that make it unable to pay for renewal, all the developed work could be useless since the company’s license to use the product would turn off. Talk about a major corporate risk! Would you build your dream house on annually-leased land?

While my career was still focused on SAS, something else was happening in the data & analytics field. R, a language that had once been dismissed by all but the advanced academic, was starting to make its way onto the mainstream business stage. The origins of R date back to the work of John Chambers at Bell Labs in 1975.  While at Bell Labs, John formed a statistical computing language that later became known as “S.”  Around 1993, R was written as an implementation of S by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand.  R is open-source and freely available.

In the early 2000′s, I had done some experimentation in R.  Compared to other tools I was using, including SAS, SPSS, Matlab, and Stata, I was not too impressed with R.  I found that it frequently crashed and seemed inconsistent at times. This led me to pass on using the tool and not take a second look until nearly a decade later.

Frustrated with the challenges I saw with proprietary tools in recent years, I decided to take another look at the available technologies in the field.  What I found shocked me.  R, a tool that I had dismissed years prior, had come a very long way.  In fact, once I began using R, I could not put it down!  After about 4 months, I realized that I could do everything in R that I was doing with SAS.  Best of all – R was completely free!

What I really like about R is how quickly the R community has built the tool up.  I’d estimate that R grew more in the last 3 years than many similar proprietary tools grew in 30 years.  That is the power of open-source, that is, there are a lot of eyes on R.  Comparing R to other similar technologies is like comparing Wikipedia to an old-fashioned encyclopedia. In terms of capabilities, I have come to believe that R will become the de-facto standard in business statistical computing over the next decade.  From its database integration to its parallel processing abilities, R is a fairly complete package.

After realizing how far R had come, my eyes began to open about who else was using R.  Some of the top technology companies of our time- including Google and Facebook – are R users.  As fate would have it, around the time of this realization, I received a call from the representative of a large Wall Street financial firm.  They needed help converting a SAS environment over to R.  At first, I thought this was unusual, but I then realized that this is what other large companies are doing.  Hence, Rconvert.com was born – to help companies make this transition.