Programming with R: Logic, Conditionals, and Loops

some morals

“Don’t worry if it doesn’t work right. If everything did, you’d be out of a job.” (Mosher’s Law of Software Engineering)

“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” — Bill Gates

logic

logical operators

the first three logical operators we’ll talk about first are ! (“not”), & (“and”), and | (“or”).

! negates (i.e., flips) the logical value placed on its right-hand-side
& returns TRUE if its left-hand-side and right-hand-side are both TRUE; otherwise FALSE
| returns TRUE if at least one of its left-hand-side or right-hand-side are TRUE; otherwise FALSE

logical operators

let’s do a really simple handful of examples first

! TRUE        # evaluates to: FALSE
TRUE & TRUE   # => TRUE
TRUE | FALSE  # => TRUE

logical operators

in a rare case, you may need to determine if only one of the two arguments given is TRUE, and in those cases you can use xor(x,y).

xor(TRUE, FALSE)  # => TRUE
xor(FALSE, TRUE)  # => TRUE
xor(TRUE, TRUE)   # => FALSE
xor(FALSE, FALSE) # => FALSE

logical operators

R is a vectorized language (meaning: many of its functions and most basic data types aim to work seamlessly with vectors of data), so these logical operators are vectorized too.

! c(TRUE, FALSE, TRUE)         # => c(FALSE, TRUE, FALSE)
c(TRUE, TRUE) & c(TRUE, FALSE) # => c(TRUE, FALSE)
c(TRUE, TRUE) | c(TRUE, FALSE) # => c(TRUE, TRUE)

xor(c(FALSE, TRUE, TRUE), c(TRUE, FALSE, TRUE))
# => c(TRUE, TRUE, FALSE)

logical operators

there are a few more logical operators which you may not need to use often.

&& and || are not vectorized and evaluate from left-to-right only proceeding until the result is determined (meaning: they exit early if a result can be determined without looking at the whole expression)

# for example, the following code will never evaluate
# `complicated_expression_that_yields_something` because the && operator will
# exit early since it "knows" that both sides must be TRUE, otherwise the result
# will be FALSE
complicated_expression_that_yields_FALSE && complicated_expression_that_yields_something

# similarly, the following code will never evaluate
# `complicated_expression_that_yields_something` because the || operator will
# exit early since it "knows" that if either side is TRUE, the result will be
# TRUE
complicated_expression_that_yields_TRUE || complicated_expression_that_yields_something

the point of these operators is fundamentally about computational efficiency. these operators become especially beneficial if the arguments given are computationally demanding to compute, or there’s a real need for the code to be extremely fast.

logical operators

remember, you can always get help for the logical operators by entering ?"&" or help("&") into R. doing this (or replacing & with any of the other logical operators) will bring up the help pages for the logical operators.

those help pages are also online here:
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Logic.html

other ways to produce logical values

the logical operators are not the only operators or functions in R that will produce logical values/vectors in R.

some operators that will produce logical values are the relational operators:

> (greater than), >= (greater than or equal to), < (less than), <= (less than or equal to), == (equal to), != (not equal to)

5 > 1               # => TRUE
5 >= 5              # => TRUE
5 == 5              # => TRUE
5 != 4              # => TRUE
"apple" != "banana" # => TRUE

identical

Hodu Tip!

sometimes you don’t want to compare two objects element-wise, but rather to tell if they’re the exact same.

in that situation, you can use identical() instead of ==.

x <- 1:10 * 10
y <- seq(10, 100, by = 10)

x == y

 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

identical(x, y)

[1] TRUE

other ways to produce logical values

some functions that will produce logical values are:

any(x) and all(x)
the any(x) and all(x) functions are designed to work with vectors and respectively test if any of the values in the vector x are TRUE, and if all of the values in the vector x are TRUE.
both of these functions take an additional argument na.rm that defaults to FALSE. na.rm. stands for “remove NAs”.
- typically, any operation with NA is NA (and similarly for NaNs). for example, 1 + NA == NA.
- if set to TRUE, then any(x, na.rm = TRUE) and all(x, na.rm = TRUE) will ignore NA values.

all(c(TRUE, TRUE, TRUE)) # => TRUE
any(c(FALSE, TRUE, FALSE)) # => TRUE

all(c(TRUE, NA, TRUE))  # => NA
any(c(FALSE, TRUE, NA)) # => NA

all(c(TRUE, NA, TRUE), na.rm = TRUE)  # => TRUE
any(c(FALSE, TRUE, NA), na.rm = TRUE) # => TRUE

testing for membership

one function that produces logical values is the %in% operator. if you write x %in% y, R will return TRUE if x is contained in y.

5 %in% c(1,2,3,4,5)
#> TRUE

'a' %in% c('a', 'b', 'c')
#> TRUE

# %in% is vectorized
c(1, 2, 3) %in% c(1, 3, 5)
#>  TRUE FALSE  TRUE

a quick note on T and F

in R code, you may sometimes see TRUE and FALSE abbreviated to T and F.

however, this is generally advised against because the T and F variables may be overwritten with other values.

# an example 
isTRUE(T) # => TRUE
T <- 0
isTRUE(T) # => FALSE

# but you can't overwrite `TRUE` itself
TRUE <- 0
# => Error in TRUE <- 0 : invalid (do_set) left-hand side to assignment

in short, try not to use T and F as TRUE and FALSE.

determining TRUE values

there may be times when you need to detect which items in a vector are TRUE, for example when you want to select only those values that satisfy some condition.

there’s two ways to solve this problem:

use logical indexing with vectors; or
use which()

x <- 1:5 # x is a vector of 1, 2, 3, 4, 5
y <- x * 3

y %% 2 == 0 # the %% operator can be read "remainder after division by ..." 
# in this case, `y %% 2 == 0` will be TRUE if y is even

areEven <- y %% 2 == 0
areEven
# => c(FALSE, TRUE, FALSE, TRUE, FALSE)

y[areEven]
# => c(6, 12)

determining TRUE values

equally valid is to use the which() operation to return an integer vector of indices where the value in the argument is TRUE.

x <- 1:5 # x is a vector of 1, 2, 3, 4, 5
y <- x * 3

y %% 2 == 0 # the %% operator can be read "remainder after division by ..." 
# in this case, `y %% 2 == 0` will be TRUE if y is even

areEven <- y %% 2 == 0
areEven
# => c(FALSE, TRUE, FALSE, TRUE, FALSE)

which(areEven)
# => c(2, 4)
# i.e., the 2nd and 4th values are even

y[which(areEven)]
# => c(6, 12)

conditionals

if

in programming, control statements refer to code that the computer uses to decide what set of following code statements (also called expressions) should be run.

the first control statement we’ll be looking at is the if statement.

# the syntax for an if statement:
if (logical_expression) {
  code_to_run 
}

the code_to_run in the example will only be run if the logical_expression is TRUE.

an example

meet the sample.int() function: given an argument n, and a size, it will randomly choose size number of integers (with or without replacement depending on the replace argument) from the integers 1 through n.

x <- sample.int(n = 10, size = 1) # sample a number between 1 and 10

# if x is greater than or equal to 5, print the result
if (x >= 5) { 
  print(x)
}

# => this might print something like 7, or it might not print anything!

another example

here’s an example where we have characters in a list, along with their pets, and we want to grab random a random character and print their pet’s name, but avoid printing NULL to the console.

characters <- list(
  harry = list(pet = 'hedwig'),
  ron = list(pet = c('scabbers', 'pigwidgeon')),
  hermione = list(pet = 'crookshanks'),
  hagrid = list(pet = c('fang', 'aragog')),
  dumbledore = list(pet = 'fawkes'),
  filch = list(pet = 'mrs. norris')
  fred = list(), # fred and george don't have pets
  george = list()
)

random_character <- characters[[sample.int(n = length(characters), size = 1)]]
random_character_pet <- random_character$pet

if (! is.null(random_character_pet)) {
  print(random_character_pet)
}

# this might print one or two pets, or it might not print anything! 

# to be clear, this will _not_ print NULL, which is what the value of 
# characters$fred$pet is, for example.

paste0

before introducing the next control statement let’s talk about the paste0 command: paste0("string one", "string two") will concatenate the strings given to it and return "string onestring two". it will take as many arguments as you give it.

paste0 is a sister function to paste which takes sep as an argument for what should be inserted between the strings given, and defaults to a single space.

here’s an example:

x <- "hello "
y <- "world"
z <- "!"
paste0(x, y, z)
#> "hello world!"

if and else

now we can add another branch to the code by adding an else condition.

characters <- list(
  harry = list(pet = 'hedwig'),
  ron = list(pet = c('scabbers', 'pigwidgeon')),
  hermione = list(pet = 'crookshanks'),
  hagrid = list(pet = c('fang', 'aragog')),
  dumbledore = list(pet = 'fawkes'),
  filch = list(pet = 'mrs. norris')
  fred = list(), # fred and george don't have pets
  george = list()
)

random_int <- sample.int(n = length(characters), size = 1) # get a random number
random_character <- characters[[random_int]]               # get the corresponding character
random_character_name <- names(characters)[random_int]     # get their name
random_character_pet <- random_character$pet               # get their pet(s)

if (is.null(random_character_pet)) {
  print(paste0(random_character_name, " doesn't have any pets!"))
} else {
  print(paste0(random_character_name, "'s pet(s):"))
  print(random_character_pet)
}

# this might print "george doesn't have any pets!" or it might print something
# like: 
#   "these are ron's pets:"
#   "scabbers"   "pigwidgeon"

else if

let’s say we draw a random number and want to classify it as a small, medium, or large number depending on if it’s between 1-33 (small), 34-66 (medium), or 66+ (large).

x <- sample.int(n = 100, size = 1) # draw a random number between 1-100


# one way to write this:
# this way takes advantage of what we know about how x was created
if (x <= 33) {
  print("x is small (i.e., between 1 and 33)")
} else if (x <= 66) {
  print("x is medium (i.e., between 34 and 66)")
} else {
  print("x is large (i.e., between 67 and 100)")
}


# another way:
# this way doesn't take advantage of what we know about x
if (x <= 33) {
  print("x is small (i.e., 33 or less)")
} else if (x <= 66) {
  print("x is medium (i.e., above 33 and up to 66)")
} else if (x <= 100) {
  print("x is large (i.e., above 67 and up to 100)")
} else {
  stop("something has gone wrong! we weren't supposed to get here")
}
# this way has an error if an unexpected situation arises -- you might try
# setting x to something silly like "apple" and running the two versions above
# again.

loops

for loops

a for loop is used to iterate over a sequence.

for loop are used when you have a block of code you want to repeat a fixed number of times.

# the syntax for a for loop
for (iterator in sequence) {
  code_to_repeat
}

an example

let’s say you want to print out the squares of the numbers 1 through 25. that would be fairly tedious to do by hand, but you can use a for loop to make it easy.

for (i in 1:25) {
  print(i^2)
}
#> [1] 1
#> [1] 4
#> [1] 9
#> [1] 16
#> [1] 25
#> [1] 36
#> [1] 49
#> [1] 64
#> [1] 81
#> [1] 100
#> [1] 121
#> [1] 144
#> [1] 169
#> [1] 196
#> [1] 225
#> [1] 256
#> [1] 289
#> [1] 324
#> [1] 361
#> [1] 400
#> [1] 441
#> [1] 484
#> [1] 529
#> [1] 576
#> [1] 625

# obviously the above code is much shorter than writing 
print(1^2)
print(2^2)
print(3^2)
print(4^2)
print(5^2)
# and so on and so forth...

another example

for loops can iterate over any data structure that can be sequenced (i.e., indexed by an integer), so feel free to use them with character vectors.

animals <- c('sloths', 'platypus', 'sea otters', 'ringtail lemurs')

for (animal in animals) {
  print(paste0(animal, " rule!"))
}
#> [1] "sloths rule!"
#> [1] "platypus rule!"
#> [1] "sea otters rule!"
#> [1] "ringtail lemurs rule!"

a more realistic example

here’s one situation I find myself running into at least a few times a year: occasionally I’ll find U.S. national data in Excel or CSV files that are split up by state. If I’m lucky, they’ll be named something like Alabama.csv, Alaska.csv, Arizona.csv, …

This is a great situation to use a for loop!

state_data <- list() # start with an empty list

for (state in state.name) {  # state.name is a list of state names built into R

  # we'll talk more about reading in data later today
  new_data <- read.csv(file = paste0(state, ".csv")) 
  
  # create a list element for the data being read in named after the state
  state_data[[state]] <- state_data 
}

# clean up the state_data list

another example with states

sometimes you’ll want to use indexing in a for loop to reference values within multiple sequences at the same time.

here’s an example:

for (i in 1:50) {
  print(paste0("the abbreviation for ", state.name[i], " is ", state.abb[i]))
}
#> [1] "the abbreviation for Alabama is AL"
#> [1] "the abbreviation for Alaska is AK"
#> [1] "the abbreviation for Arizona is AZ"
#> [1] "the abbreviation for Arkansas is AR"
#> [1] "the abbreviation for California is CA"
#> [1] "the abbreviation for Colorado is CO"
#> [1] "the abbreviation for Connecticut is CT"
#> [1] "the abbreviation for Delaware is DE"
#> [1] "the abbreviation for Florida is FL"
#> [1] "the abbreviation for Georgia is GA"
#> [1] "the abbreviation for Hawaii is HI"
#> [1] "the abbreviation for Idaho is ID"
#> [1] "the abbreviation for Illinois is IL"
#> [1] "the abbreviation for Indiana is IN"
#> [1] "the abbreviation for Iowa is IA"
#> [1] "the abbreviation for Kansas is KS"
#> [1] "the abbreviation for Kentucky is KY"
#> [1] "the abbreviation for Louisiana is LA"
#> [1] "the abbreviation for Maine is ME"
#> [1] "the abbreviation for Maryland is MD"
#> [1] "the abbreviation for Massachusetts is MA"
#> [1] "the abbreviation for Michigan is MI"
#> [1] "the abbreviation for Minnesota is MN"
#> [1] "the abbreviation for Mississippi is MS"
#> [1] "the abbreviation for Missouri is MO"
#> [1] "the abbreviation for Montana is MT"
#> [1] "the abbreviation for Nebraska is NE"
#> [1] "the abbreviation for Nevada is NV"
#> [1] "the abbreviation for New Hampshire is NH"
#> [1] "the abbreviation for New Jersey is NJ"
#> [1] "the abbreviation for New Mexico is NM"
#> [1] "the abbreviation for New York is NY"
#> [1] "the abbreviation for North Carolina is NC"
#> [1] "the abbreviation for North Dakota is ND"
#> [1] "the abbreviation for Ohio is OH"
#> [1] "the abbreviation for Oklahoma is OK"
#> [1] "the abbreviation for Oregon is OR"
#> [1] "the abbreviation for Pennsylvania is PA"
#> [1] "the abbreviation for Rhode Island is RI"
#> [1] "the abbreviation for South Carolina is SC"
#> [1] "the abbreviation for South Dakota is SD"
#> [1] "the abbreviation for Tennessee is TN"
#> [1] "the abbreviation for Texas is TX"
#> [1] "the abbreviation for Utah is UT"
#> [1] "the abbreviation for Vermont is VT"
#> [1] "the abbreviation for Virginia is VA"
#> [1] "the abbreviation for Washington is WA"
#> [1] "the abbreviation for West Virginia is WV"
#> [1] "the abbreviation for Wisconsin is WI"
#> [1] "the abbreviation for Wyoming is WY"

while

a while loop allows you to repeatedly run some code until a condition is met.

this has the advantage that you don’t have to know how many times the code will be run in advance, in contrast to a for loop where you have to have a pre-determined sequence that you’ll iterate along.

here’s a hypothetical: if you draw numbers between 1 through 10 randomly and sum them up, how long will it take you to get a value of at least 100?

it won’t ever take longer than 100 iterations, and it can’t take less than 10 iterations.

the expected draw is 5.5, so it should take around 18.2 summations on average, I would think.

while

value <- 0 
while (value < 100) {
  value <- value + sample.int(n = 10, size = 1)
  print(value)
}

[1] 9
[1] 11
[1] 13
[1] 17
[1] 26
[1] 28
[1] 30
[1] 34
[1] 36
[1] 40
[1] 50
[1] 52
[1] 61
[1] 62
[1] 70
[1] 71
[1] 80
[1] 88
[1] 89
[1] 91
[1] 97
[1] 100

while

that’s great, but it doesn’t tell us about how many iterations it took

n_iterations <- 0 
value <- 0 
while (value < 100) {
  n_iterations <- n_iterations + 1
  value <- value + sample.int(n = 10, size = 1)
}
print(n_iterations)

[1] 16

print(value)

[1] 102

while

what if we want to create a larger sample of data to see if our hypothesis was close to some observed data?

iterations_needed <- numeric() # create an empty numeric vector

# we'll run the experiment 10,000 times so we have a substantial amount of data
for (i in 1:10000) {
  n_iterations <- 0 
  value <- 0 
  while (value < 100) {
    n_iterations <- n_iterations + 1
    value <- value + sample.int(n = 10, size = 1)
  }
  iterations_needed <- c(iterations_needed, n_iterations)
}

summary(iterations_needed)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  12.00   17.00   19.00   18.73   20.00   31.00

key takeaways

!, &, and | should be your go-to logical operators
any(), all() and the relational operators ==, !=, >, <, >=, <= are some other functions that work with logical values
use if, if else and else if to run code depending on a specified condition
use for loops to run code with an iterator that goes through a given sequence
use while loops to run code until a given condition is met

let’s do a learnr tutorial

# install devtools if you haven't already
install.packages("devtools") 

# install our ID529tutorials package
devtools::install_github("ID529/ID529tutorials")

library(ID529tutorials)
run_tutorial('logic', 'ID529tutorials')

plus here’s a google doc for questions: https://bit.ly/day2-doc