Functions

morals

“The function of good software is to make the complex appear to be simple.”
—Grady Booch

“Beware of bugs in the above code; I have only proved it correct, not tried it.”
—Donald Knuth

“If you can’t informally describe a function in one line, the function is probably too large.”
—(paraphrased) Eric S. Raymond

morals

“You should consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.e., you now have three copies of the same code).
—R for Data Science, Hadley Wickham, Garrett Grolemund

anatomy of a function

  • name
  • arguments
  • body
    • return value(s)
    • error/warning conditions
  • its scope/environment
  • documentation (not always present)

principles for functions

  • use communicative names
  • no side effects, please
  • throw errors and warnings
  • write tests

examples

# Calculate the Length of the Hypotenuse for a Right Triangle with Base of
# Length x and Height of Length y
#
# Uses the Pythagorean theorem to calculate the length of the hypotenuse of a
# right triangle with the side-lengths given.
#   
# Reference: https://en.wikipedia.org/wiki/Pythagorean_theorem
# 
# arguments: 
#   x   length of the triangle base 
#   y   height of the triangle 
# 
# returns: 
#   the length of the hypotenuse of the triangle
calculate_hypotenuse_length <- function(x,y) {
  sqrt(x^2 + y^2)
}

calculate_hypotenuse_length(3,4)
[1] 5

examples

calculate_hypotenuse_length(5,12)
[1] 13
calculate_hypotenuse_length(8,15)
[1] 17
calculate_hypotenuse_length(7,24)
[1] 25
calculate_hypotenuse_length(20,21)
[1] 29
calculate_hypotenuse_length(12,35)
[1] 37

arguments

arguments

arguments are a little bit like playing make-believe: “what if x were this? what if argument2 were this?”

in that sense, you need to make sure you suspend belief about whatever x or argument2 are outside of the function because inside the function, they have whatever value they were given as to the function.

x <- 1
argument2 <- 'spooky value!'

my_fun_function <- function(x, argument2) {
  print(paste0("the value of x inside this function: ", x))
  print(paste0("the value of tricky_argument inside this function: ", 
               argument2))
}

my_fun_function(1234, "i'm not spooky!")
[1] "the value of x inside this function: 1234"
[1] "the value of tricky_argument inside this function: i'm not spooky!"

arguments

  • named arguments
  • missing arguments
  • default arguments
  • detecting missing arguments

argument names

note that you can provide the arguments in the order specified, or you can give named arguments when calling the function (in which case, order doesn’t matter):

calculate_hypotenuse_length(5, 12)
[1] 13
calculate_hypotenuse_length(x = 5, y = 12)
[1] 13
calculate_hypotenuse_length(y = 12, x = 5)
[1] 13

missing arguments

some functions do not require that all arguments be passed.

when an argument is required and it’s not given, it will result in an error.

here’s a simple example:

# two_arguments_steve likes to talk a lot
two_arguments_steve <- function(arg1, arg2) {
  for (i in 1:3) {
    print(arg1)
  }
}

two_arguments_steve("hello!")
[1] "hello!"
[1] "hello!"
[1] "hello!"
# two_arguments_steve won't run without arg1
two_arguments_steve(arg2 = "what's up?")
Error in print(arg1): argument "arg1" is missing, with no default

default arguments

one of the common situations in which arguments don’t need to be passed is if they have default values.

function_with_lovely_defaults <- function(x = "great", y = "day") {
  print(paste0("i'm having a ", x, " ", y))
}

function_with_lovely_defaults()
[1] "i'm having a great day"
function_with_lovely_defaults("kale", "salad")
[1] "i'm having a kale salad"

detecting missing arguments

sometimes you may need to detect within a function if an argument is missing.

this can be done with the missing() function.

snarky_function <- function(argument) {
  if (missing(argument)) {
    print("i can't believe you've done this")
  } else {
    print(paste0("here's your argument: ", argument))
  }
}

snarky_function()
[1] "i can't believe you've done this"
snarky_function("did you know the earliest crocodilian evolved 95 million years ago?! 🐊")
[1] "here's your argument: did you know the earliest crocodilian evolved 95 million years ago?! 🐊"

returning early

sometimes you may have situations in which you don’t need to run all of the code. a classic example is when the output of a function should depend on an if else code block.

in those situations, you should write out a return() function call explicitly.

sum_and_square <- function(x,y) {
  if (all(c(is.numeric(x), is.numeric(y)))) { 
    return((x + y)^2)
  } else {
    warning("x and y aren't both numeric")
    return(NA)
  }
}

raising errors

we could improve on the function given by adding code to raise errors for problematic input.

# first let's look at what happens without the code to raise errors:
calculate_hypotenuse_length(1.3e154, 1.3e154)
[1] Inf


calculate_hypotenuse_length <- function(x,y) {
  # make sure the arguments aren't too big
  if (x >= 1.3e154 | y >= 1.3e154) {
    stop("calculate_hypotenuse_length only supports arguments less than 1.3e154")
  }
  
  sqrt(x^2 + y^2)
}

calculate_hypotenuse_length(1.3e154, 1.3e154)
Error in calculate_hypotenuse_length(1.3e+154, 1.3e+154): calculate_hypotenuse_length only supports arguments less than 1.3e154

write your own error messages

Hodu Tip! with lilac color styling

it’s often the case that R will automatically raise errors when the arguments given to a function are incompatible with the steps being done inside the function;

however, R’s errors can be incredibly terse, so it may be more user-friendly to write your own errors, especially ones that give a little bit more context.

scopes and environments

scopes and environments

what do you think will happen?

x <- 1 
some_function <- function() {
  x <- 2 
  print(x)
}

some_function()
[1] 2
print(x)
[1] 1

scopes and environments

how about now?

x <- 1
tricky_function <- function() {
  inner_function <- function() {
    x <- 2
  }
  
  inner_function()
  print(x)
}

tricky_function()
[1] 1
print(x)
[1] 1

scopes and environments

and how about now?

x <- 1
tricky_function <- function() {
  inner_function <- function() {
    print(x)
  }
  
  x <- 2
  inner_function()
}

tricky_function()
[1] 2
print(x)
[1] 1

scopes and environments

let’s make those examples a little less mysterious:

  • objects referenced will assume the value of where they were defined in either the scope in which they are referenced, or the scope most immediately parental to the current scope in which they’ve been defined.
  • here’s an analogy: let’s say both your grandmother and your mom are both named Elphaba, and someone asks you “Who is Elphaba?” If you’re R, you’d say “my mom.”
  • objects assigned within a function will not be updated in the global scope or that function’s parent scope(s).
  • here’s an analogy: if a function is like considering a hypothetical written down for you, then if the hypothetical says “let’s say dogs had five legs” or "x is 5", you could work out some conclusions (dog booties would be sold in packs of five, or x^2 == 25), but that hypothetical doesn’t change anything about the rest of the world.

some things worth explicitly discussing

  • functions can call other functions
    • we’ve already seen this
  • functions are first-class citizens
    • so you can treat them similarly to other variables
  • you can make functions that make functions
  • anonymous functions
  • infix notation

functions as first-class citizens

you can do things with functions in R that you can’t do in all languages, like put them in a list or pass them as arguments.

f1 <- function() {
  cat("hello ")
}
f2 <- function() {
  cat("world")
}
f3 <- function() {
  cat("!")
}
my_functions <- list(f1, f2, f3)

for (i in 1:3) {
  my_functions[[i]]()
}
hello world!

functions as first-class citizens

my_fancy_function <- function(arg_f) {
  arg_f(10)
}

secondary_function1 <- function(x) {
  x^2
}

secondary_function2 <- function(x) {
  factorial(x)
}

my_fancy_function(arg_f = secondary_function1)
#> 100
my_fancy_function(arg_f = secondary_function2)
#> 3628800
my_fancy_function(arg_f = factorial)
#> 3628800

functions as first-class citizens

Hodu Tip! with magenta color styling

in programming languages, if functions are first-class citizens, then we say that programming language supports functional programming, or that it is a functional programming language

making functions with functions (function factories)

the key principle behind function factories can be described simply:

The enclosing environment of the manufactured function is an execution environment of the function factory.

power2 <- function(exp) {
  force(exp) # ensures exp is not lazily evaluated
  function(x) {
    x ^ exp
  }
}

square <- power2(2)
cube <- power2(3)

square(2)
#> 4
cube(2)
#> 8

this isn’t a practice you will need to use often, if ever, but it is very helpful to know about to understand code you may come across and is quite instructive about how scoping/environments work in R.

anonymous functions

functions can be created without a name (as you may have noticed in the last example). these functions are called anonymous functions.

there are a handful of ways to create anonymous functions in R, and some that are even in a specific syntax according to the package/framework you’re working in.

# the oldest, most readable way:
function(x) {
  x^2
}
# can be abbreviated to: 
function(x) { x^2 } # if writing inline

# introduced in R 4.1.0 (May 2021)
\(x) x^2

# many functions in the tidyverse use the 
# tilde syntax (also called formula syntax)
~ .^2

anonymous functions

so how would i use anonymous functions?

library(purrr)
vec <- c(1,2,3)
# map_dbl applies the function given to each element and returns
# a numeric vector
map_dbl(vec, function(x) { x^2 }) 
#> c(1, 4, 9)

we’re not going to fully get into functional programming right now, but this is a teaser for a later lecture on functional programming.

testing

testing functions with test_that

library(testthat)

test_that("calculate_hypotenuse_length works as intended", {
  # test known pythagorean triples 
  expect_equal(calculate_hypotenuse_length(0, 0), 0)
  expect_equal(calculate_hypotenuse_length(1, 1), sqrt(2))
  expect_equal(calculate_hypotenuse_length(3, 4), 5)
  expect_equal(calculate_hypotenuse_length(5, 12), 13)
  expect_equal(calculate_hypotenuse_length(7, 24), 25)
  
  # test that non-numerics throw an error
  expect_error(calculate_hypotenuse_length(1, 'apple'))
  
  # test that negative numbers are supported;
  # this uses _random_ testing
  expect_gt(calculate_hypotenuse_length(
    -1 * sample.int(n = 10, size = 1), 
    -1 * sample.int(n = 10, size = 1)), 0)
})
Test passed 😸

key takeaways

  • functions should help you automate the boring/repetitive stuff
  • arguments can be:
    • named
    • missing
    • given default values
  • take care around function scoping/environments – this can be the source of a lot of bugs
  • throw warnings and errors in your functions to make future debugging easier
  • use testing to ensure your functions work to your specification

references

challenges for the activity

  • relatively easy: write a function that takes a numeric vector and returns the square of every element

  • medium: write a function that takes a character vector and returns the shortest word (hint: use nchar())

  • quite hard: what’s the scrabble points earned for a given word? (assume all letters were played with letter tiles, not blank tiles)

here’s the scrabble points for each letter:

(1 point)-A, E, I, O, U, L, N, S, T, R
(2 points)-D, G
(3 points)-B, C, M, P
(4 points)-F, H, V, W, Y
(5 points)-K
(8 points)- J, X
(10 points)-Q, Z