Thanks to Reddit.com/r/programming for directing me to this excellent compilation of R programming information by Hadley Wickham.
Most of us in analytics use R just to get stuff done. It has a large number of packages that lets us produce insights at our own pace. What I mean by “at our own pace” is that we typically analyse data either individually or within our team, with the results communicated to our audience independently of the analysis itself.
This creates a situation where programmatic efficiency was not considered a priority by many R analysts. If a function was sub-optimal, it would only impact us with a little extra processing time, our end user would never know the difference. In fact, it made more sense to create a fast coded, long processing solution as the code would only be used a few times. Spending hours optimizing code would be considered a waste.
However, analysts are now crunching massive amounts of data. What would have cause a few minutes delay in processing a few years ago will now cause hours and hours of unnecessary waiting. Addionally, reducing processing time is now become more valuable as we move toward elastic cloud computing where more processing = more cost.
Most importantly, analytics is moving toward machine learning solutions, requiring the algo to be constantly updating and impacting automated decisions in close to real time. For example, predicting likely online customer behavior based on current site activity. Slow processing would make a very ineffective modeled solution.
Given these factors, we are rapidly reaching the point where an analyst’s programming abilities will be just as important and their statistical skill set. Over the next few weeks I plan to pick out a few relevant items from the Advance R Programming review which will help us Analysts make our programs faster and easier to understand.