[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[1] TRUE
“Don’t worry if it doesn’t work right. If everything did, you’d be out of a job.” (Mosher’s Law of Software Engineering)
“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” — Bill Gates
the first three logical operators we’ll talk about first are !
(“not”), &
(“and”), and |
(“or”).
!
negates (i.e., flips) the logical value placed on its right-hand-side&
returns TRUE
if its left-hand-side and right-hand-side are both TRUE
; otherwise FALSE
|
returns TRUE
if at least one of its left-hand-side or right-hand-side are TRUE
; otherwise FALSE
let’s do a really simple handful of examples first
in a rare case, you may need to determine if only one of the two arguments given is TRUE
, and in those cases you can use xor(x,y)
.
R is a vectorized language (meaning: many of its functions and most basic data types aim to work seamlessly with vectors of data), so these logical operators are vectorized too.
there are a few more logical operators which you may not need to use often.
&&
and ||
are not vectorized and evaluate from left-to-right only proceeding until the result is determined (meaning: they exit early if a result can be determined without looking at the whole expression)# for example, the following code will never evaluate
# `complicated_expression_that_yields_something` because the && operator will
# exit early since it "knows" that both sides must be TRUE, otherwise the result
# will be FALSE
complicated_expression_that_yields_FALSE && complicated_expression_that_yields_something
# similarly, the following code will never evaluate
# `complicated_expression_that_yields_something` because the || operator will
# exit early since it "knows" that if either side is TRUE, the result will be
# TRUE
complicated_expression_that_yields_TRUE || complicated_expression_that_yields_something
the point of these operators is fundamentally about computational efficiency. these operators become especially beneficial if the arguments given are computationally demanding to compute, or there’s a real need for the code to be extremely fast.
remember, you can always get help for the logical operators by entering ?"&"
or help("&")
into R. doing this (or replacing &
with any of the other logical operators) will bring up the help pages for the logical operators.
those help pages are also online here:
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Logic.html
the logical operators are not the only operators or functions in R that will produce logical values/vectors in R.
some operators that will produce logical values are the relational operators:
>
(greater than), >=
(greater than or equal to), <
(less than), <=
(less than or equal to), ==
(equal to), !=
(not equal to)some functions that will produce logical values are:
any(x)
and all(x)
any(x)
and all(x)
functions are designed to work with vectors and respectively test if any of the values in the vector x
are TRUE
, and if all of the values in the vector x
are TRUE
.na.rm
that defaults to FALSE
. na.rm.
stands for “remove NAs”.
1 + NA == NA
.TRUE
, then any(x, na.rm = TRUE)
and all(x, na.rm = TRUE)
will ignore NA
values.one function that produces logical values is the %in%
operator. if you write x %in% y
, R will return TRUE
if x
is contained in y
.
in R code, you may sometimes see TRUE
and FALSE
abbreviated to T
and F
.
however, this is generally advised against because the T
and F
variables may be overwritten with other values.
read more:
https://www.r-bloggers.com/2012/12/r-tip-avoid-using-t-and-f-as-synonyms-for-true-and-false/
in short, try not to use T
and F
as TRUE
and FALSE
.
there may be times when you need to detect which items in a vector are TRUE
, for example when you want to select only those values that satisfy some condition.
there’s two ways to solve this problem:
which()
equally valid is to use the which()
operation to return an integer vector of indices where the value in the argument is TRUE
.
x <- 1:5 # x is a vector of 1, 2, 3, 4, 5
y <- x * 3
y %% 2 == 0 # the %% operator can be read "remainder after division by ..."
# in this case, `y %% 2 == 0` will be TRUE if y is even
areEven <- y %% 2 == 0
areEven
# => c(FALSE, TRUE, FALSE, TRUE, FALSE)
which(areEven)
# => c(2, 4)
# i.e., the 2nd and 4th values are even
y[which(areEven)]
# => c(6, 12)
in programming, control statements refer to code that the computer uses to decide what set of following code statements (also called expressions) should be run.
the first control statement we’ll be looking at is the if
statement.
the code_to_run
in the example will only be run if the logical_expression
is TRUE
.
occasionally you may see R code where an if
statement is written inline without curly braces like if (condition) code_to_run
— this is not considered to be best practice and should generally be avoided.
meet the sample.int()
function: given an argument n
, and a size
, it will randomly choose size
number of integers (with or without replacement depending on the replace
argument) from the integers 1 through n
.
here’s an example where we have characters in a list, along with their pets, and we want to grab random a random character and print their pet’s name, but avoid printing NULL to the console.
characters <- list(
harry = list(pet = 'hedwig'),
ron = list(pet = c('scabbers', 'pigwidgeon')),
hermione = list(pet = 'crookshanks'),
hagrid = list(pet = c('fang', 'aragog')),
dumbledore = list(pet = 'fawkes'),
filch = list(pet = 'mrs. norris')
fred = list(), # fred and george don't have pets
george = list()
)
random_character <- characters[[sample.int(n = length(characters), size = 1)]]
random_character_pet <- random_character$pet
if (! is.null(random_character_pet)) {
print(random_character_pet)
}
# this might print one or two pets, or it might not print anything!
# to be clear, this will _not_ print NULL, which is what the value of
# characters$fred$pet is, for example.
before introducing the next control statement let’s talk about the paste0
command: paste0("string one", "string two")
will concatenate the strings given to it and return "string onestring two"
. it will take as many arguments as you give it.
paste0
is a sister function to paste
which takes sep
as an argument for what should be inserted between the strings given, and defaults to a single space.
now we can add another branch to the code by adding an else condition.
characters <- list(
harry = list(pet = 'hedwig'),
ron = list(pet = c('scabbers', 'pigwidgeon')),
hermione = list(pet = 'crookshanks'),
hagrid = list(pet = c('fang', 'aragog')),
dumbledore = list(pet = 'fawkes'),
filch = list(pet = 'mrs. norris')
fred = list(), # fred and george don't have pets
george = list()
)
random_int <- sample.int(n = length(characters), size = 1) # get a random number
random_character <- characters[[random_int]] # get the corresponding character
random_character_name <- names(characters)[random_int] # get their name
random_character_pet <- random_character$pet # get their pet(s)
if (is.null(random_character_pet)) {
print(paste0(random_character_name, " doesn't have any pets!"))
} else {
print(paste0(random_character_name, "'s pet(s):"))
print(random_character_pet)
}
# this might print "george doesn't have any pets!" or it might print something
# like:
# "these are ron's pets:"
# "scabbers" "pigwidgeon"
let’s say we draw a random number and want to classify it as a small, medium, or large number depending on if it’s between 1-33 (small), 34-66 (medium), or 66+ (large).
x <- sample.int(n = 100, size = 1) # draw a random number between 1-100
# one way to write this:
# this way takes advantage of what we know about how x was created
if (x <= 33) {
print("x is small (i.e., between 1 and 33)")
} else if (x <= 66) {
print("x is medium (i.e., between 34 and 66)")
} else {
print("x is large (i.e., between 67 and 100)")
}
# another way:
# this way doesn't take advantage of what we know about x
if (x <= 33) {
print("x is small (i.e., 33 or less)")
} else if (x <= 66) {
print("x is medium (i.e., above 33 and up to 66)")
} else if (x <= 100) {
print("x is large (i.e., above 67 and up to 100)")
} else {
stop("something has gone wrong! we weren't supposed to get here")
}
# this way has an error if an unexpected situation arises -- you might try
# setting x to something silly like "apple" and running the two versions above
# again.
a for loop is used to iterate over a sequence.
for loop are used when you have a block of code you want to repeat a fixed number of times.
let’s say you want to print out the squares of the numbers 1 through 25. that would be fairly tedious to do by hand, but you can use a for loop to make it easy.
for (i in 1:25) {
print(i^2)
}
#> [1] 1
#> [1] 4
#> [1] 9
#> [1] 16
#> [1] 25
#> [1] 36
#> [1] 49
#> [1] 64
#> [1] 81
#> [1] 100
#> [1] 121
#> [1] 144
#> [1] 169
#> [1] 196
#> [1] 225
#> [1] 256
#> [1] 289
#> [1] 324
#> [1] 361
#> [1] 400
#> [1] 441
#> [1] 484
#> [1] 529
#> [1] 576
#> [1] 625
# obviously the above code is much shorter than writing
print(1^2)
print(2^2)
print(3^2)
print(4^2)
print(5^2)
# and so on and so forth...
for loops can iterate over any data structure that can be sequenced (i.e., indexed by an integer), so feel free to use them with character vectors.
here’s one situation I find myself running into at least a few times a year: occasionally I’ll find U.S. national data in Excel or CSV files that are split up by state. If I’m lucky, they’ll be named something like Alabama.csv
, Alaska.csv
, Arizona.csv
, …
This is a great situation to use a for loop!
state_data <- list() # start with an empty list
for (state in state.name) { # state.name is a list of state names built into R
# we'll talk more about reading in data later today
new_data <- read.csv(file = paste0(state, ".csv"))
# create a list element for the data being read in named after the state
state_data[[state]] <- state_data
}
# clean up the state_data list
sometimes you’ll want to use indexing in a for loop to reference values within multiple sequences at the same time.
here’s an example:
for (i in 1:50) {
print(paste0("the abbreviation for ", state.name[i], " is ", state.abb[i]))
}
#> [1] "the abbreviation for Alabama is AL"
#> [1] "the abbreviation for Alaska is AK"
#> [1] "the abbreviation for Arizona is AZ"
#> [1] "the abbreviation for Arkansas is AR"
#> [1] "the abbreviation for California is CA"
#> [1] "the abbreviation for Colorado is CO"
#> [1] "the abbreviation for Connecticut is CT"
#> [1] "the abbreviation for Delaware is DE"
#> [1] "the abbreviation for Florida is FL"
#> [1] "the abbreviation for Georgia is GA"
#> [1] "the abbreviation for Hawaii is HI"
#> [1] "the abbreviation for Idaho is ID"
#> [1] "the abbreviation for Illinois is IL"
#> [1] "the abbreviation for Indiana is IN"
#> [1] "the abbreviation for Iowa is IA"
#> [1] "the abbreviation for Kansas is KS"
#> [1] "the abbreviation for Kentucky is KY"
#> [1] "the abbreviation for Louisiana is LA"
#> [1] "the abbreviation for Maine is ME"
#> [1] "the abbreviation for Maryland is MD"
#> [1] "the abbreviation for Massachusetts is MA"
#> [1] "the abbreviation for Michigan is MI"
#> [1] "the abbreviation for Minnesota is MN"
#> [1] "the abbreviation for Mississippi is MS"
#> [1] "the abbreviation for Missouri is MO"
#> [1] "the abbreviation for Montana is MT"
#> [1] "the abbreviation for Nebraska is NE"
#> [1] "the abbreviation for Nevada is NV"
#> [1] "the abbreviation for New Hampshire is NH"
#> [1] "the abbreviation for New Jersey is NJ"
#> [1] "the abbreviation for New Mexico is NM"
#> [1] "the abbreviation for New York is NY"
#> [1] "the abbreviation for North Carolina is NC"
#> [1] "the abbreviation for North Dakota is ND"
#> [1] "the abbreviation for Ohio is OH"
#> [1] "the abbreviation for Oklahoma is OK"
#> [1] "the abbreviation for Oregon is OR"
#> [1] "the abbreviation for Pennsylvania is PA"
#> [1] "the abbreviation for Rhode Island is RI"
#> [1] "the abbreviation for South Carolina is SC"
#> [1] "the abbreviation for South Dakota is SD"
#> [1] "the abbreviation for Tennessee is TN"
#> [1] "the abbreviation for Texas is TX"
#> [1] "the abbreviation for Utah is UT"
#> [1] "the abbreviation for Vermont is VT"
#> [1] "the abbreviation for Virginia is VA"
#> [1] "the abbreviation for Washington is WA"
#> [1] "the abbreviation for West Virginia is WV"
#> [1] "the abbreviation for Wisconsin is WI"
#> [1] "the abbreviation for Wyoming is WY"
a while
loop allows you to repeatedly run some code until a condition is met.
this has the advantage that you don’t have to know how many times the code will be run in advance, in contrast to a for
loop where you have to have a pre-determined sequence that you’ll iterate along.
here’s a hypothetical: if you draw numbers between 1 through 10 randomly and sum them up, how long will it take you to get a value of at least 100?
it won’t ever take longer than 100 iterations, and it can’t take less than 10 iterations.
the expected draw is 5.5, so it should take around 18.2 summations on average, I would think.
that’s great, but it doesn’t tell us about how many iterations it took
what if we want to create a larger sample of data to see if our hypothesis was close to some observed data?
iterations_needed <- numeric() # create an empty numeric vector
# we'll run the experiment 10,000 times so we have a substantial amount of data
for (i in 1:10000) {
n_iterations <- 0
value <- 0
while (value < 100) {
n_iterations <- n_iterations + 1
value <- value + sample.int(n = 10, size = 1)
}
iterations_needed <- c(iterations_needed, n_iterations)
}
summary(iterations_needed)
Min. 1st Qu. Median Mean 3rd Qu. Max.
12.00 17.00 19.00 18.73 20.00 31.00
plus here’s a google doc for questions: https://bit.ly/day2-doc