When I started with R, I soon discovered that, more often than not, a package name has a particular meaning. For example, the first package I ever installed was foreign
. The name corresponds to its ability to read and write data from other foreign psources to R. While this and many other names are rather straightforward, others are much less intuitive. The name of a package often conveys a story, which is inspired by a general property of its functions. And sometimes I just don’t get the deeper meaning, because English is not my native language.
In this blog post, I will shed light on the wonderful world of package names. After this journey, you will not only admire the creativity of R package creators; you’ll also be king or queen at your next class reunion! Or at least at the next R-Meetup.
Before we start, and I know that you are eager to continue, I have two remarks about this article. First: Sometimes, I refer to official explanations from the authors or other sources; other times, it’s just my personal explanation of why a package is called that way. So if you know better or otherwise, do not hesitate to contact me. Second: There are currently 15,341 packages on CRAN, and I am sure there are a lot more naming mysteries and ingenuities to discover than any curious blog reader would like to digest in one sitting. Therefore, I focussed on the most famous packages and added some of my other preferences.
But enough of the talking now, let’s start!
dplyr (diːˈplaɪə)
You might have noticed that many packages contain the string plyr, e.g. dbplyr
, implyr
, dtplyr
, and so on. This homophone of pliers corresponds to its refining of base R apply
-functions as part of the “split-apply-combine” strategy. Instead of doing all steps for data analysis and manipulation at once, you split the problem into manageable pieces, apply your function to each piece, and combine everything together afterward. We see this approach in perfection when we use the pipe operator. The first part of each package just refers to the object it is applied upon. So the d stands for data frames, db for databases, im for Apache Impala, dt for data tables, and so on… Sources: Hadley Wickham
lubridate (ˈluːbrɪdeɪt)
This wonderful package makes it so easy and smooth to work with dates and times in R. You could say it goes like a clockwork. In German, there is a proverb with the same meaning (“Das läuft wie geschmiert”), that can literally be translated to: “It works as lubricated”
ggplot2 (ʤiːʤiːplɒt tuː)
Leland Wilkinson wrote a book in which he defined multiple components that a comprehensive plot is made of. You have to define the data you want to show, what kind of plot it should be, e.g., points or lines, the scales of the axes, the legend, axis titles, etc. These parts, he called them layers, should be built on top of each other. The title of this influential piece of paper is Grammar of Graphics. Once you got it, it enables you to build complex yet meaningful plots with concise styling across packages. That’s because its logic has also been used by many other packages like plotly, rBokeh, visNetwork, or apexcharter. Sources: ggplot2
data.table (ˈdeɪtə ˈteɪbl) – logo
Okay, full disclosure, I am a tidyverse guy, and one of my sons shall be named Hadley. At least one. However, this does not mean that I don’t appreciate the very powerful package data.table
. Occasionally, I take the liberty and exploit its functions to improve the performance of my code (hello fread()
and rbindlist()
). Anyway, the name itself is pretty straightforward – but did you notice how cool the logo is?! Well, there is obviously the name “data.table” and the square brackets that are fundamental in data.table syntax. Likewise, there is the assignment by reference operator, a.k.a. the walrus operator. “Wait, stop,” your inner marine mammal researcher says, “isn’t this a sea lion on top there?!” Yes indeed! The sea lion is used to highlight that it is an R package since, of course, it shouts R! R!. Source: Rdatatable
tibble (tɪbl)
Regular base R data frames are nice, but did you ever print a data frame in the console, unaware that it is 10 million rows long? Good luck with interrupting R without quitting the whole session. That might be one of the reasons why the tidyverse uses another type of data frames: tibbles. The name tibble could just stem from its similar sound to table, but I suspect there is more to it than meets the eye. Did you ever hear the story about Tibbles and Stephen Island’s Wren? NO? Then let me take you to New Zealand, AD 1894. Between the northern and southern main islands of NZ, there is a small and uninhabited island: Stephen Island. Its rocks have been the downfall of many poor souls that tried to pass the Cook Strait. Therefore, it was decided to build a lighthouse as that ships shall henceforth pass safely and undamaged. Due to its isolation, Stephen Island was the only habitat for many rare species. One of these was Lyall’s wren, a small flightless passerine. It did not know any predators and lived its life in joy and harmony, until… The arrival of the first lighthouse keeper. His name was David Lyall and he was a man interested in natural history and, facing a long and lonely time on his own at Stephen Island, the owner of a cat. This cat was not satisfied by just comforting Mr. Lyall and enjoying beach walks. Shortly after his arrival, Mr. Lyall noticed the carcasses of little birds, seemingly slaughtered and dishonored by a fierce predator. Interested in biology as he was, he found out that these small birds were a distinct species. He preserved some carcasses in alcohol and sent them to a friend. This was in October 1894. A scientific article about the wren was published in an ornithology journal, soon making the specimen a sought-after collector’s item. The summer in New Zealand goes on and in February 1895, four bird-watchers arrived at Stephen Island. They were looking for this cute little wren and found… none. Within a few months, Mr. Lyalls hungry cat made the whole species go extinct. On March 16, 1895, the Christchurch newspaper The Press wrote: “there is very good reason to believe that the bird is no longer to be found on the island, and, as it is not known to exist anywhere else, it has apparently become quite extinct. This is probably a record performance in the way of extermination.”. The name of the cat? Tibbles. Sources: Wikipedia; All About Birds; Oddity Central Indicator: the fridge of Hadley Wickham’s parents
purrr (pɜːɜː)
This extension of the base R apply
-functions has been one of my favorites lately. The concise usage of purrr
enables powerful functional programming that, in turn, makes your code faster, more readable, and more stable. Or, as Mr. Wickham states, it makes “your pure R functions purr“. Also, note its parallelized sibling furrr
. Sources: Hadley Wickham
Amelia (əˈmiːlɪə)
During my Master’s degree, I had a course about missing data and multiple imputations. One of the packages we used, or rather analyzed, was Amelia
. It turned out that this package is named after an impressive woman: Amelia Earhart. Living in the early 20th century, she was an aviation pioneer and feminist. She has been the first woman to fly solo across the Atlantic, a remarkable achievement and an inspiration for women to start a technical career. Unfortunately, she disappeared during a flight over the central pacific at age 39 and is thus… missing. ba dum-tss Source: Gary King – Co-Author
magrittr (maɡʁitə)
The conciseness of coding with dplyr
or its siblings is not imaginable without the pipe operator %>%. This allows you to write and read code from top to bottom and from left to right, just like regular text. Pipes are no special feature of R, yet I am sure René Magritte had nothing else in mind when he painted The Treachery of Images in 1929 with its slogan: “Ceci n’est pas une pipe“. The logo designers just made a slight adjustment to his painting. Or should I say: unearthed the meaning that has always been behind it?! Sources: Vignette, revolutionanalytics.com
batman (ˈbætmən)
Data science can be quite fun if it weren’t for the data. Especially when working with textual data, typos and inconsistent coding can be very cumbersome. For example, you’ve got questionnaire data consisting of yes/no questions. For R, this corresponds to TRUE
/FALSE
, but who would write this in a questionnaire? In fact, when we try to convert such data to logical values by calling as.logical()
, almost every string becomes NA
. Lost and doomed? NO! Cause who is more expert to determine actual NA
‘s than nananananana… batman
!
Homeric (həʊˈmɛrɪk)
Hey, you made it all the way down here?! You deserve a little treat! What about a soft, sweet, and special-sprinkled donut? And who would be better suitable to present it to you, than the best-known lover of donuts himself: Homer Simpson! Just help yourself: Homeric::PlotDoughnut(1, col = "magenta")
Source: Homeric Documentation
fcuk (fʌk)
Error in view(my_data): could not find function "view"
Are you sick and tired of this or similar error messages? Do you regularly employ your ample stock of swear words to describe the stupidity of inconsistent usage of camel or snake case function names across packages? Or do you just type faster than your shadow, causing minor typos in your, otherwise, excellent code? There is help! Just go and install the amazing fcuk
package and useless error messages are a thing of the past.
hellno (hɛl nəʊ)
Slip into the role of a dedicated R user. I can only imagine the troubles I had to have with a specific default argument value of a base R function to write an entire package that just handles this case. I am talking about the tormentor of many beginRs when working with as.data.frame()
: stringsAsFactors = TRUE
. But I do not only change it to FALSE
! Also, I create my own FALSE
value and name it HELLNO
.
Honorable mentions
gremlin
: package for mixed-effects model REML incorporating Generalized Inverses.harrietr
: named after Charles Darwin’s pet giant tortoise. A package for phylogenetic and evolutionary biology data manipulations.beginr
: it helps where we’ve all been, searching for ages until settingpch = 16
.charlatan
: worse than creating dubious medicine, this one makes fake data.fauxpas
: explains what specific HTTP errors mean.fishualize
: give your plots a fishy look.greybox
: why just thinking black or white? This is a package for time series analysis.vroom
: it reads data so fast to R, you almost can hear it making vroom vroom.helfRlein
: some little helper functions, inspired by the German word Helferlein = little helper.