“The function of good software is to make the complex appear to be simple.” —Grady Booch
“Beware of bugs in the above code; I have only proved it correct, not tried it.” —Donald Knuth
“If you can’t informally describe a function in one line, the function is probably too large.” —(paraphrased) Eric S. Raymond
morals
“You should consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.e., you now have three copies of the same code). —R for Data Science, Hadley Wickham, Garrett Grolemund
anatomy of a function
name
arguments
body
return value(s)
error/warning conditions
its scope/environment
documentation (not always present)
principles for functions
use communicative names
no side effects, please
throw errors and warnings
write tests
examples
# Calculate the Length of the Hypotenuse for a Right Triangle with Base of# Length x and Height of Length y## Uses the Pythagorean theorem to calculate the length of the hypotenuse of a# right triangle with the side-lengths given.# # Reference: https://en.wikipedia.org/wiki/Pythagorean_theorem# # arguments: # x length of the triangle base # y height of the triangle # # returns: # the length of the hypotenuse of the trianglecalculate_hypotenuse_length <-function(x,y) {sqrt(x^2+ y^2)}calculate_hypotenuse_length(3,4)
[1] 5
examples
calculate_hypotenuse_length(5,12)
[1] 13
calculate_hypotenuse_length(8,15)
[1] 17
calculate_hypotenuse_length(7,24)
[1] 25
calculate_hypotenuse_length(20,21)
[1] 29
calculate_hypotenuse_length(12,35)
[1] 37
arguments
arguments
arguments are a little bit like playing make-believe: “what if x were this? what if argument2 were this?”
in that sense, you need to make sure you suspend belief about whatever x or argument2 are outside of the function because inside the function, they have whatever value they were given as to the function.
x <-1argument2 <-'spooky value!'my_fun_function <-function(x, argument2) {print(paste0("the value of x inside this function: ", x))print(paste0("the value of tricky_argument inside this function: ", argument2))}my_fun_function(1234, "i'm not spooky!")
[1] "the value of x inside this function: 1234"
[1] "the value of tricky_argument inside this function: i'm not spooky!"
arguments
named arguments
missing arguments
default arguments
detecting missing arguments
argument names
note that you can provide the arguments in the order specified, or you can give named arguments when calling the function (in which case, order doesn’t matter):
calculate_hypotenuse_length(5, 12)
[1] 13
calculate_hypotenuse_length(x =5, y =12)
[1] 13
calculate_hypotenuse_length(y =12, x =5)
[1] 13
missing arguments
some functions do not require that all arguments be passed.
when an argument is required and it’s not given, it will result in an error.
here’s a simple example:
# two_arguments_steve likes to talk a lottwo_arguments_steve <-function(arg1, arg2) {for (i in1:3) {print(arg1) }}two_arguments_steve("hello!")
[1] "hello!"
[1] "hello!"
[1] "hello!"
# two_arguments_steve won't run without arg1two_arguments_steve(arg2 ="what's up?")
Error in print(arg1): argument "arg1" is missing, with no default
default arguments
one of the common situations in which arguments don’t need to be passed is if they have default values.
function_with_lovely_defaults <-function(x ="great", y ="day") {print(paste0("i'm having a ", x, " ", y))}function_with_lovely_defaults()
[1] "i'm having a great day"
function_with_lovely_defaults("kale", "salad")
[1] "i'm having a kale salad"
detecting missing arguments
sometimes you may need to detect within a function if an argument is missing.
snarky_function("did you know the earliest crocodilian evolved 95 million years ago?! 🐊")
[1] "here's your argument: did you know the earliest crocodilian evolved 95 million years ago?! 🐊"
returning early
sometimes you may have situations in which you don’t need to run all of the code. a classic example is when the output of a function should depend on an ifelse code block.
in those situations, you should write out a return() function call explicitly.
sum_and_square <-function(x,y) {if (all(c(is.numeric(x), is.numeric(y)))) { return((x + y)^2) } else {warning("x and y aren't both numeric")return(NA) }}
raising errors
we could improve on the function given by adding code to raise errors for problematic input.
# first let's look at what happens without the code to raise errors:calculate_hypotenuse_length(1.3e154, 1.3e154)
[1] Inf
calculate_hypotenuse_length <-function(x,y) {# make sure the arguments aren't too bigif (x >=1.3e154| y >=1.3e154) {stop("calculate_hypotenuse_length only supports arguments less than 1.3e154") }sqrt(x^2+ y^2)}calculate_hypotenuse_length(1.3e154, 1.3e154)
Error in calculate_hypotenuse_length(1.3e+154, 1.3e+154): calculate_hypotenuse_length only supports arguments less than 1.3e154
write your own error messages
it’s often the case that R will automatically raise errors when the arguments given to a function are incompatible with the steps being done inside the function;
however, R’s errors can be incredibly terse, so it may be more user-friendly to write your own errors, especially ones that give a little bit more context.
scopes and environments
scopes and environments
what do you think will happen?
x <-1some_function <-function() { x <-2print(x)}some_function()
[1] 2
print(x)
[1] 1
scopes and environments
how about now?
x <-1tricky_function <-function() { inner_function <-function() { x <-2 }inner_function()print(x)}tricky_function()
[1] 1
print(x)
[1] 1
scopes and environments
and how about now?
x <-1tricky_function <-function() { inner_function <-function() {print(x) } x <-2inner_function()}tricky_function()
[1] 2
print(x)
[1] 1
scopes and environments
let’s make those examples a little less mysterious:
objects referenced will assume the value of where they were defined in either the scope in which they are referenced, or the scope most immediately parental to the current scope in which they’ve been defined.
here’s an analogy: let’s say both your grandmother and your mom are both named Elphaba, and someone asks you “Who is Elphaba?” If you’re R, you’d say “my mom.”
objects assigned within a function will not be updated in the global scope or that function’s parent scope(s).
here’s an analogy: if a function is like considering a hypothetical written down for you, then if the hypothetical says “let’s say dogs had five legs” or "x is 5", you could work out some conclusions (dog booties would be sold in packs of five, or x^2 == 25), but that hypothetical doesn’t change anything about the rest of the world.
some things worth explicitly discussing
functions can call other functions
we’ve already seen this
functions are first-class citizens
so you can treat them similarly to other variables
you can make functions that make functions
anonymous functions
infix notation
functions as first-class citizens
you can do things with functions in R that you can’t do in all languages, like put them in a list or pass them as arguments.
f1 <-function() {cat("hello ")}f2 <-function() {cat("world")}f3 <-function() {cat("!")}my_functions <-list(f1, f2, f3)for (i in1:3) { my_functions[[i]]()}
in programming languages, if functions are first-class citizens, then we say that programming language supports functional programming, or that it is a functional programming language
making functions with functions (function factories)
the key principle behind function factories can be described simply:
The enclosing environment of the manufactured function is an execution environment of the function factory.
power2 <-function(exp) {force(exp) # ensures exp is not lazily evaluatedfunction(x) { x ^ exp }}square <-power2(2)cube <-power2(3)square(2)#> 4cube(2)#> 8
this isn’t a practice you will need to use often, if ever, but it is very helpful to know about to understand code you may come across and is quite instructive about how scoping/environments work in R.
anonymous functions
functions can be created without a name (as you may have noticed in the last example). these functions are called anonymous functions.
there are a handful of ways to create anonymous functions in R, and some that are even in a specific syntax according to the package/framework you’re working in.
# the oldest, most readable way:function(x) { x^2}# can be abbreviated to: function(x) { x^2 } # if writing inline# introduced in R 4.1.0 (May 2021)\(x) x^2# many functions in the tidyverse use the # tilde syntax (also called formula syntax)~ .^2
anonymous functions
so how would i use anonymous functions?
library(purrr)vec <-c(1,2,3)# map_dbl applies the function given to each element and returns# a numeric vectormap_dbl(vec, function(x) { x^2 }) #> c(1, 4, 9)
we’re not going to fully get into functional programming right now, but this is a teaser for a later lecture on functional programming.
testing
testing functions with test_that
library(testthat)test_that("calculate_hypotenuse_length works as intended", {# test known pythagorean triples expect_equal(calculate_hypotenuse_length(0, 0), 0)expect_equal(calculate_hypotenuse_length(1, 1), sqrt(2))expect_equal(calculate_hypotenuse_length(3, 4), 5)expect_equal(calculate_hypotenuse_length(5, 12), 13)expect_equal(calculate_hypotenuse_length(7, 24), 25)# test that non-numerics throw an errorexpect_error(calculate_hypotenuse_length(1, 'apple'))# test that negative numbers are supported;# this uses _random_ testingexpect_gt(calculate_hypotenuse_length(-1*sample.int(n =10, size =1), -1*sample.int(n =10, size =1)), 0)})
Test passed 😸
key takeaways
functions should help you automate the boring/repetitive stuff
arguments can be:
named
missing
given default values
take care around function scoping/environments – this can be the source of a lot of bugs
throw warnings and errors in your functions to make future debugging easier
use testing to ensure your functions work to your specification