burglr – stealing code from the web

André Bleier Blog, Data Science

Introduction

All we do at STATWORX all day long is stealing code from the web. That is why I thought it would only be fair to code a function which does that conveniently. With burglr you have all functions and kickass machine learning models at your fingertips.

This would have been a more exciting description of the function I will introduce in this blog. However, to be completely honest, the motivation and application of the function is slightly less exciting and less shady.

Motivation

Like most programmers, I like to automate repetitive work. As many programmers can also relate, you often need these automations in many different projects. This means you need somewhat flexible access to your code, for instance when your colleague asks you to debug a function you have coded decades ago. Following the R philosophy, the most obvious action would be to wrap an R package around your automation. While this approach is by no means complicated, it comes with additional developing costs you might want to avoid. Furthermore, you still need a platform to share your package, because CRAN seems often not to be an appropriate choice and the admission takes way too long for your simple run of the mill automation.

GitHub (or similar products like GitLab) seem like a reasonable choice to share code with your colleagues. The only obstacle to overcome is to clone the repository and afterwards source the function you are interested in. However, from my experience there are still some pitfalls. First, it is by no means clear where to clone the git repository on your hard-drive. Everyone has their own preferences, so if you are not using R projects you run into the following problem

# My own working directory
source("~/Desktop/funny_elephant_pictures/my_secret_git_code/Xy/Xy.R")

# My colleague’s directory 
source("~/Projects/2018/Xy/Xy.R") 

Now imagine you and your colleague are working on the same project in one R script. Both of you want to source the amazing supervised learning simulation function Xy but with different working directories. And let’s take that one step further by imagining you are both working on the same git repository. Nuclear meltdown programmer’s edition, since both of you would have to work around the working directory of your colleague.

Second, creating a package for every tiny function is very time-consuming. This is where burglr comes into play. Cloning the burglr library opens up the possibility to source R scripts from the web (and hence github), so you do not need to clone repositories 24/7. The most important upside though, is that you can deploy burglr on say a colleague’s workspace and source every function you need from the web.

The first heist

So far, I was very abstract, so let me demonstrate the functionality with a little example. Imagine your colleague calls you over to debug a function. What I like to do in such cases is to use my debugging function dive. Of course, none of my colleagues are using my cool functions, so I would have to clone the repository. To circumvent cloning the repository directly I can reach it with the burglr package. To be fair, I have to clone the repository once, since I need the burglr function. Thus, we breakeven with the first usage of burglr. The burglr package can be installed conveniently from github GitHub using the devtools package.

# Installing devtools 
install.packages("devtools") 

# Install burglr
devtools::install_github("andrebleier/burglr") 

By executing the code chunk above you have just installed burglr like any other package you would install from CRAN. Finally, we are all setup to steal some code. The burglr function has only one mandatory argument, namely a vector of URLs to steal code from.

Of course, an URL specification inherits potential pitfalls. However, the only thing you need to keep in mind is that the URL ends with an R-script “.R”. burglr is not a crawling package or something similar, hence it only steals the text from files which are specified via web URL. When you want to source GitHub R-scripts just make sure you navigate into the file itself and copy the URL.

When you are all setup with your URL you can just run burglr:

# steal code  
burglr::burglr("https://www.github.com/andrebleier/dive/blob/master/dive.R") 

The function will recognize that you sourced a function and print a visual response that a function with the name dive was sourced. If the content of the script is not a function there will be a visual statement that the file was sourced successfully.

If you have any suggestions or you find bugs, feel free to e-mail me or just create an issue on my GitHub.

Über den Autor
André Bleier

André Bleier

The most exciting part of being a data scientist at STATWORX is to find this unique solution to a problem by fusing machine learning, statistics, and business knowledge.

ABOUT US


STATWORX
is a consulting company for data science, statistics, machine learning and artificial intelligence located in Frankfurt, Zurich and Vienna. Sign up for our NEWSLETTER and receive reads and treats from the world of data science and AI. If you have questions or suggestions, please write us an e-mail addressed to blog(at)statworx.com.