How to Write Conditional Statements in R: Four Methods | by Rory Spanton | Jun, 2023


Learn powerful ways to go beyond if-else statements and level up your R code

Photo by Caleb Jones on Unsplash

You won’t get far in programming without conditional statements.

Conditional statements execute code based on the result of a true-or-false condition. They’re an essential part of coding, and this is especially true in R. Whether you’re using R for data analysis, machine learning, software development, or something else, conditional statements have infinite uses.

But, most beginners in R don’t realize that there are many ways to write them. Many people learn basic if-else statements and stop there. But, there are often neater, more efficient ways to write conditional statements. Advanced R programmers know each of these techniques and when to use them. So, how can you learn to do the same?

In this article, we’ll take a look at four different ways to write conditional statements in R. We’ll also cover the strengths and limitations of each technique, and when to use each one.

The most straightforward way of writing conditional statements in R is by using the if and else keywords. This will be most familiar if you already know another programming language, and it’s often the technique that new R users learn first.

A standard if statement in R looks like this:

if (condition) {
# Code to execute
}

Here, condition is a logical expression that returns either TRUE or FALSE. If the condition returns TRUE, any code inside the curly braces is executed. If it returns FALSE, the code inside the brackets is not executed, and R moves on to the next line of code in the script.

To see how this works in practice, we can take the following example.

age <- 25

if (age >= 18) {
age_group <- "adult"
}

Here, we have a variable that contains an age. The if statement then evaluates whether the value of age is greater than or equal to 18. This is true in this case, so the variable age_group takes a value of “adult”.

This is an easy way of checking a simple condition and doing something if it’s true. But what if we want our statement to run some code if the condition is false?

If else statements are an extension of the basic if statement. To understand them, we can add to our previous example.

if (age >= 18) {
age_group <- "adult"
} else {
age_group <- "child"
}

This code works just like the last example, with one exception. Instead of moving on when the condition is FALSE, the code inside the curly brackets after else gets executed. This means that if age is greater than or equal to 18, age_groupis assigned a value of “adult”. If not, age_group is set to “child”.

If-else statements are a straightforward way of controlling the code in an R script. They’re easily understood, can be extended to take many conditions, and can execute complex code that’s many lines long.

But, if-else statements can take up a lot of space. For simple expressions like the one above, there are other ways of doing exactly the same operation without using five lines of code.

In fact, it’s possible to write if-else statements using one line of code.

Inline conditional statements are a neat way of expressing “if-else” logic in a single line of code. There are a couple of ways to write them.

Inline if else statement

First, it’s possible to write a simple inline statement using the if and else keywords. This takes the form below:

age_group <- if (age >= 18) "adult" else "child"

This statement works the same way as the previous example. The only difference is that now, we’ve condensed the phrasing to fit on one line. If the condition is TRUE, the value of age_group gets updated to whatever is before the else keyword — in this case, “adult”. If it were FALSE, age_group would be assigned whatever comes after else.

The big difference here is that we now assign the result of the whole conditional statement to the variable age_group. This improves on the repetitive phrasing in the standard if-else example, where we had to write this assignment twice.

Base-R ifelse function

If you prefer, you can use the ifelse function instead. The code below uses this function to execute the same logic as the previous examples.

age_group <- ifelse(age >= 18, "adult", "child")

The ifelse function takes three arguments. First, comes the condition, then a value to return if the condition is true, and a value to return if the condition is false.

This is a clean, straightforward way of writing a short conditional statement. It also has another advantage; it’s vectorized.

Vectorization is an important concept in R. If a function is vectorized, it automatically applies to multiple values instead of just one. To see an example with the ifelse function, let’s assign more values to our variable age, and run the code again.

age <- c(16, 45, 23, 82)

age_group <- ifelse(age >= 18, "adult", "child")
# Returns "child" "adult" "adult" "adult"

The ifelse function automatically evaluates all the values in age, returning a sequence of corresponding outputs. This makes ifelse a clean way of evaluating lots of simple conditions without needing slow, messy loops.

Although ifelse can evaluate many inputs easily, there are other ways to do this.

Indexing allows R programmers to access specific parts of a data structure that contains many values. For example, if we wanted to get the third element in the vector age from the last example, we could index age with 3 inside square brackets:

age <- c(16, 45, 23, 82)

age
# Returns 16, 45, 23, 82

age[3]
# Returns 23

It’s most common to use numbers to index values with certain positions, like in the code above. But, many beginner R programmers don’t know that you can also use logical conditions when indexing. This opens up all sorts of possibilities.

Let’s create some example data to illustrate some of these options. This includes some information about users, such as age, as in the previous examples. But, rather than being stored in a vector, each user’s information is stored row-wise in a tibble. This is the kind of data structure you’d be likely to see if dealing with user data in a professional setting, so it’s useful to know how to apply conditional logic to it.

set.seed(123)

user_data <- tibble(
user_id = 1:10,
age = floor(runif(10, min = 13, max = 35)),
region = sample(c("UK", "USA", "EU"), 10, replace = TRUE)
)

The data created by the code above.

Tibbles and data frames are made up of vectors, which means we can index them in the same way. This allows us to do all sorts of things.



Source link

Leave a Comment