Exploring apply, sapply, lapply, and map Functions in R

R Programming
apply
sapply
lapply
map
Author

M. Fatih Tüzen

Published

April 15, 2024

Modified

April 15, 2024

Introduction

In R programming, Apply functions (apply(), sapply(), lapply()) and the map() function from the purrr package are powerful tools for data manipulation and analysis. In this comprehensive guide, we will delve into the syntax, usage, and examples of each function, including the usage of built-in functions and additional arguments, as well as performance benchmarking.

Understanding apply() Function

The apply() function in R is used to apply a specified function to the rows or columns of an array. Its syntax is as follows:

apply(X, MARGIN, FUN, ...)
  • X: The input data, typically an array or matrix.

  • MARGIN: A numeric vector indicating which margins should be retained. Use 1 for rows, 2 for columns.

  • FUN: The function to apply.

  • ...: Additional arguments to be passed to the function.

Let’s calculate the mean of each row in a matrix using apply():

matrix_data <- matrix(1:9, nrow = 3)
row_means <- apply(matrix_data, 1, mean)
print(row_means)
[1] 4 5 6

This example computes the mean of each row in the matrix.

Let’s calculate the standard deviation of each column in a matrix and specify additional arguments (na.rm = TRUE) using apply():

column_stdev <- apply(matrix_data, 2, sd, na.rm = TRUE)
print(column_stdev)
[1] 1 1 1

Understanding sapply() Function

The sapply() function is a simplified version of lapply() that returns a vector or matrix. Its syntax is similar to lapply():

sapply(X, FUN, ...)
  • X: The input data, typically a list.

  • FUN: The function to apply.

  • ...: Additional arguments to be passed to the function.

Let’s calculate the sum of each element in a list using sapply():

num_list <- list(a = 1:3, b = 4:6, c = 7:9)
sum_results <- sapply(num_list, sum)
print(sum_results)
 a  b  c 
 6 15 24 

This example computes the sum of each element in the list.

Let’s convert each element in a list to uppercase using sapply() and the toupper() function:

text_list <- list("hello", "world", "R", "programming")
uppercase_text <- sapply(text_list, toupper)
print(uppercase_text)
[1] "HELLO"       "WORLD"       "R"           "PROGRAMMING"

Here, sapply() applies the toupper() function to each element in the list, converting them to uppercase.

Understanding lapply() Function

The lapply() function applies a function to each element of a list and returns a list. Its syntax is as follows:

lapply(X, FUN, ...)
  • X: The input data, typically a list.

  • FUN: The function to apply.

  • ...: Additional arguments to be passed to the function.

Let’s apply a custom function to each element of a list using lapply():

num_list <- list(a = 1:3, b = 4:6, c = 7:9)
custom_function <- function(x) sum(x) * 2
result_list <- lapply(num_list, custom_function)
print(result_list)
$a
[1] 12

$b
[1] 30

$c
[1] 48

In this example, lapply() applies the custom function to each element in the list.

Let’s extract the vowels from each element in a list of words using lapply() and a custom function:

word_list <- list("apple", "banana", "orange", "grape")
vowel_list <- lapply(word_list, function(word) grep("[aeiou]", strsplit(word, "")[[1]], value = TRUE))
print(vowel_list)
[[1]]
[1] "a" "e"

[[2]]
[1] "a" "a" "a"

[[3]]
[1] "o" "a" "e"

[[4]]
[1] "a" "e"

Here, lapply() applies the custom function to each element in the list, extracting vowels from words.

Understanding map() Function

The map() function from the purrr package is similar to lapply() but offers a more consistent syntax and returns a list. Its syntax is as follows:

map(.x, .f, ...)
  • .x: The input data, typically a list.

  • .f: The function to apply.

  • ...: Additional arguments to be passed to the function.

Let’s apply a lambda function to each element of a list using map():

library(purrr)
num_list <- list(a = 1:3, b = 4:6, c = 7:9)
mapped_results <- map(num_list, ~ .x^2)
print(mapped_results)
$a
[1] 1 4 9

$b
[1] 16 25 36

$c
[1] 49 64 81

In this example, map() applies the lambda function (squared) to each element in the list.

Let’s calculate the lengths of strings in a list using map() and the nchar() function:

text_list <- list("hello", "world", "R", "programming")
string_lengths <- map(text_list, nchar)
print(string_lengths)
[[1]]
[1] 5

[[2]]
[1] 5

[[3]]
[1] 1

[[4]]
[1] 11

Here, map() applies the nchar() function to each element in the list, calculating the length of each string.

Understanding map() Function Variants

In addition to the map() function, the purrr package provides several variants that are specialized for different types of output: map_lgl(), map_int(), map_dbl(), and map_chr(). These variants are particularly useful when you expect the output to be of a specific data type, such as logical, integer, double, or character.

  • map_lgl(): This variant is used when the output of the function is expected to be a logical vector.

  • map_int(): Use this variant when the output of the function is expected to be an integer vector.

  • map_dbl(): This variant is used when the output of the function is expected to be a double vector.

  • map_chr(): Use this variant when the output of the function is expected to be a character vector.

These variants provide stricter type constraints compared to the generic map() function, which can be useful for ensuring the consistency of the output type across iterations. They are particularly handy when working with functions that have predictable output types.

library(purrr)

# Define a list of vectors
num_list <- list(a = 1:3, b = 4:6, c = 7:9)

# Use map_lgl() to check if all elements in each vector are even
even_check <- map_lgl(num_list, function(x) all(x %% 2 == 0))
print(even_check)
    a     b     c 
FALSE FALSE FALSE 
# Use map_int() to compute the sum of each vector
vector_sums <- map_int(num_list, sum)
print(vector_sums)
 a  b  c 
 6 15 24 
# Use map_dbl() to compute the mean of each vector
vector_means <- map_dbl(num_list, mean)
print(vector_means)
a b c 
2 5 8 
# Use map_chr() to convert each vector to a character vector
vector_strings <- map_chr(num_list, toString)
print(vector_strings)
        a         b         c 
"1, 2, 3" "4, 5, 6" "7, 8, 9" 

By using these specialized variants, you can ensure that the output of your mapping operation adheres to your specific data type requirements, leading to cleaner and more predictable code.

Performance Comparison

To compare the performance of these functions, it’s important to note that the execution time may vary depending on the hardware specifications of your computer, the size of the dataset, and the complexity of the operations performed. While one function may perform better in one scenario, it may not be the case in another. Therefore, it’s recommended to benchmark the functions in your specific use case.

Let’s benchmark the computation of the sum of a large list using different functions:

library(microbenchmark)

# Create a 100 x 100 matrix
matrix_data <- matrix(rnorm(10000), nrow = 100)

# Use apply() function to compute the sum for each column
benchmark_results <- microbenchmark(
  apply_sum = apply(matrix_data, 2, sum),
  sapply_sum = sapply(matrix_data, sum),
  lapply_sum = lapply(matrix_data, sum),
  map_sum = map_dbl(as.list(matrix_data), sum),  # We need to convert the matrix to a list for the map function
  times = 100
)

print(benchmark_results)
Unit: microseconds
       expr    min      lq     mean  median      uq     max neval
  apply_sum   98.1  122.95  143.123  135.60  153.35   277.8   100
 sapply_sum 2326.7 2429.75 2941.094 2514.85 2852.55 11218.3   100
 lapply_sum 2150.6 2247.55 2860.614 2364.90 2930.80  6556.0   100
    map_sum 5063.5 5342.45 6009.474 5738.35 6788.35  8139.7   100

apply_sum demonstrates the fastest processing time among the alternatives,. These results suggest that while apply() function offers the fastest processing time, it’s still relatively slow compared to other options. When evaluating these results, it’s crucial to consider factors beyond processing time, such as usability and functionality, to select the most suitable function for your specific needs.

Overall, the choice of function depends on factors such as speed, ease of use, and compatibility with the data structure. It’s essential to benchmark different alternatives in your specific use case to determine the most suitable function for your needs.

Conclusion

Apply functions (apply(), sapply(), lapply()) and the map() function from the purrr package are powerful tools for data manipulation and analysis in R. Each function has its unique features and strengths, making them suitable for various tasks.

  • apply() function is versatile and operates on matrices, allowing for row-wise or column-wise operations. However, its performance may vary depending on the size of the dataset and the nature of the computation.

  • sapply() and lapply() functions are convenient for working with lists and provide more optimized implementations compared to apply(). They offer flexibility and ease of use, making them suitable for a wide range of tasks.

  • map() function offers a more consistent syntax compared to lapply() and provides additional variants (map_lgl(), map_int(), map_dbl(), map_chr()) for handling specific data types. While it may exhibit slower performance in some cases, its functionality and ease of use make it a valuable tool for functional programming in R.

When choosing the most suitable function for your task, it’s essential to consider factors beyond just performance. Usability, compatibility with data structures, and the nature of the computation should also be taken into account. Additionally, the performance of these functions may vary depending on the hardware specifications of your computer, the size of the dataset, and the complexity of the operations performed. Therefore, it’s recommended to benchmark the functions in your specific use case and evaluate them based on multiple criteria to make an informed decision.

By mastering these functions and understanding their nuances, you can streamline your data analysis workflows and tackle a wide range of analytical tasks with confidence in R.