# Create a numeric vector
<- c(10, 20, 30, 40, 50)
my_vector
# Subset the second and fourth elements
<- my_vector[c(2, 4)]
subset
# Print the result
print(subset)
[1] 20 40
M. Fatih Tüzen
October 3, 2023
October 3, 2023
In the realm of R programming, vectors serve as the fundamental building blocks that underpin virtually every data analysis and manipulation task. Much like atoms are the smallest units of matter, vectors are the fundamental units of data in R. In this article, we will delve into the world of vectors in R programming, exploring their significance, applications, and some of the most commonly used functions that make them indispensable.
In R, a vector is a fundamental data structure that can hold multiple elements of the same data type. These elements can be numbers, characters, logical values, or other types of data. Vectors are one-dimensional, meaning they consist of a single sequence of values. These vectors can be considered as the atomic units of data storage in R, forming the basis for more complex data structures like matrices, data frames, and lists. In essence, vectors are the elemental containers for data elements.
Vectors play a pivotal role in R programming for several reasons:
Efficient Data Storage: Vectors efficiently store homogeneous data, saving memory and computational resources.
Vectorized Operations: One of the most powerful aspects of R is its ability to perform operations on entire vectors efficiently, a concept known as vectorization. R is designed for vectorized operations, meaning you can perform operations on entire vectors without the need for explicit loops. This makes code concise and faster.
Compatibility: Most R functions are designed to work with vectors, making them compatible with many data analysis and statistical techniques.
Simplicity: Using vectors simplifies code and promotes a more intuitive and readable coding style.
Interoperability: Vectors can be easily converted into other data structures, such as matrices or data frames, enhancing data manipulation capabilities.
Subsetting and indexing are essential operations in R that allow you to access specific elements or subsets of elements from a vector. Subsetting refers to the process of selecting a portion of a vector based on specific conditions or positions. Indexing, on the other hand, refers to specifying the position or positions of the elements you want to access within the vector.
Square brackets ([ ]
) is used to access and subset elements in vectors and other data structures like lists and matrices. It allows you to extract specific elements or subsets of elements from a vector.
Let’s explore these concepts with interesting examples.
Subsetting by Index
You can subset a vector by specifying the index positions of the elements you want to access.
# Create a numeric vector
my_vector <- c(10, 20, 30, 40, 50)
# Subset the second and fourth elements
subset <- my_vector[c(2, 4)]
# Print the result
print(subset)
[1] 20 40
Subsetting by Condition
You can subset a vector based on a condition using logical vectors.
Single Index
Access a single element by specifying its index.
# Create a character vector
fruits <- c("apple", "banana", "cherry")
# Access the second element
fruit <- fruits[2]
# Print the result
print(fruit)
[1] "banana"
Multiple Indices
Access multiple elements by specifying multiple indices.
# Create a numeric vector
numbers <- c(1, 2, 3, 4, 5)
# Access the first and fourth elements
subset <- numbers[c(1, 4)]
# Print the result
print(subset)
[1] 1 4
Negative Indexing
Exclude elements by specifying negative indices.
# Create a numeric vector
numbers <- c(1, 2, 3, 4, 5)
# Exclude the second element
subset <- numbers[-2]
# Print the result
print(subset)
[1] 1 3 4 5
These examples demonstrate how to subset and index vectors in R, allowing you to access specific elements or subsets of elements based on conditions, positions, or logical criteria. These operations are fundamental in data analysis and manipulation tasks in R.
Let’s explore some commonly used functions when working with vectors in R.
c()
c()
function (short for “combine” or “concatenate”) is used for creating a new vector or combining multiple values or vectors into a single vector. It allows you to create a vector by listing its elements within the function.
1. Combining Numeric Values:
2. Combining Character Strings:
# Creating a character vector
character_vector <- c("apple", "banana", "cherry")
print(character_vector)
[1] "apple" "banana" "cherry"
3. Combining Different Data Types (Implicit Coercion):
# Combining numeric and character values
# Numeric values are coerced to character.
mixed_vector <- c(1, "two", 3, "four")
class(mixed_vector)
[1] "character"
4. Combining Vectors Recursively:
seq()
In R, the seq()
function is used to generate sequences of numbers or other objects. It allows you to create a sequence of values with specified starting and ending points, increments, and other parameters. The seq()
function is quite versatile and can be used to generate sequences of integers, real numbers, or even character strings.
Here is the basic syntax of the seq()
function:
from
: The starting point of the sequence.
to
: The ending point of the sequence.
by
: The interval between values in the sequence. It is an optional parameter. If not specified, R calculates it based on the from
, to
, and length.out
parameters.
length.out
: The desired length of the sequence. It is an optional parameter. If provided, R calculates the by
parameter based on the desired length.
Here are some examples to illustrate how to use the seq()
function:
[1] 0.0 0.2 0.4 0.6 0.8 1.0
The seq()
function is very useful for creating sequences of values that you can use for various purposes, such as creating sequences for plotting, generating data for simulations, or defining custom sequences for indexing elements in vectors or data frames.
rep()
In R, the rep()
function is used to replicate or repeat values to create vectors or arrays of repeated elements. It allows you to duplicate a value or a set of values a specified number of times to form a larger vector or matrix. The rep()
function is quite flexible and can be used to repeat both individual elements and entire vectors or lists.
Here’s the basic syntax of the rep()
function:
x
: The value(s) or vector(s) that you want to repeat.
times
: An integer specifying how many times x
should be repeated. If you provide a vector for x
, each element of the vector will be repeated times
times.
each
: An integer specifying how many times each element of x
(if it’s a vector) should be repeated before moving on to the next element. This is an optional parameter.
length.out
: An integer specifying the desired length of the result. This is an optional parameter, and it can be used instead of times
and each
to determine the number of repetitions.
Here are some examples to illustrate how to use the rep()
function:
# Create a vector
my_vector <- c("A", "B", "C")
# Repeat each element of the vector 2 times
rep(my_vector, each = 2)
[1] "A" "A" "B" "B" "C" "C"
# Repeat each element of the vector with different frequencies
rep(c("A", "B", "C"), times = c(3, 2, 4))
[1] "A" "A" "A" "B" "B" "C" "C" "C" "C"
[1] 1 2 3 1 2 3 1 2 3 1
The rep()
function is useful for tasks like creating data for simulations, repeating elements for plotting, and constructing vectors and matrices with specific patterns or repetitions.
length()
In R, the length()
function is used to determine the number of elements in a vector. It returns an integer value representing the length of the vector. The length()
function is straightforward to use and provides a quick way to check the number of elements in a vector.
Here’s the basic syntax of the length()
function for vectors:
x
: The vector for which you want to find the length.Here’s an example of how to use the length()
function with vectors:
# Create a numeric vector
numeric_vector <- c(1, 2, 3, 4, 5)
# Use the length() function to find the length of the vector
length(numeric_vector)
[1] 5
The length()
function is particularly useful when you need to perform operations or make decisions based on the size or length of a vector. It is commonly used in control structures like loops to ensure that you iterate through the entire vector or to dynamically adjust the length of vectors in your code.
unique()
The unique()
function is used to extract the unique elements from a vector, returning a new vector containing only the distinct values found in the original vector. It is a convenient way to identify and remove duplicate values from a vector.
Here’s the basic syntax of the unique()
function:
x
: The vector from which you want to extract unique elements.Here’s an example of how to use the unique()
function with a vector:
# Create a vector with duplicate values
my_vector <- c(1, 2, 2, 3, 4, 4, 5)
# Use the unique() function to extract unique elements
unique(my_vector)
[1] 1 2 3 4 5
In this example, the unique()
function is applied to the my_vector
, and it returns a new vector containing only the unique values, removing duplicates. The order of the unique values in the result is the same as their order of appearance in the original vector.
The unique()
function is particularly useful when dealing with data preprocessing or data cleaning tasks, where you need to identify and handle duplicate values in a dataset. It’s also helpful when you want to generate a list of unique categories or distinct values from a categorical variable.
duplicated()
The duplicated()
function in R is a handy tool for identifying and working with duplicate elements in a vector. It returns a logical vector of the same length as the input vector, indicating whether each element in the vector is duplicated or not. You can also use the fromLast
argument to control the direction of the search for duplicates.
Here’s the detailed syntax of the duplicated()
function:
x
: The vector in which you want to identify duplicate elements.
fromLast
: An optional logical parameter (default is FALSE
). If set to TRUE
, it considers duplicates from the last occurrence of each element instead of the first.
Now, let’s dive into some interesting examples to understand how the duplicated()
function works:
# Create a vector with duplicate values
my_vector <- c(1, 2, 2, 3, 4, 4, 5)
# Use the duplicated() function to identify duplicate elements
duplicates <- duplicated(my_vector)
# Print the result
print(duplicates)
[1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE
# Get the values that are duplicated
duplicated_values <- my_vector[duplicates]
print(duplicated_values)
[1] 2 4
In this example, duplicates
is a logical vector indicating whether each element in my_vector
is duplicated. TRUE
indicates duplication, and FALSE
indicates uniqueness. We then extract the duplicated values using indexing.
Identifying Duplicates from the Last Occurrence
# Create a vector with duplicate values
my_vector <- c(1, 2, 2, 3, 4, 4, 5)
# Use the duplicated() function to identify duplicates from the last occurrence
duplicates_last <- duplicated(my_vector, fromLast = TRUE)
# Print the result
print(duplicates_last)
[1] FALSE TRUE FALSE FALSE TRUE FALSE FALSE
# Get the values that are duplicated from the last occurrence
duplicated_values_last <- my_vector[duplicates_last]
print(duplicated_values_last)
[1] 2 4
By setting fromLast = TRUE
, we identify duplicates based on their last occurrence in the vector.
# Create a vector with duplicate values
my_vector <- c(1, 2, 2, 3, 4, 4, 5)
# Use the `!` operator to negate the duplicated values and get unique values
unique_values <- my_vector[!duplicated(my_vector)]
# Print the unique values
print(unique_values)
[1] 1 2 3 4 5
In this example, we use the !
operator to negate the result of duplicated()
to get unique values in the vector.
# Create a character vector with duplicate strings
my_strings <- c("apple", "banana", "apple", "cherry", "banana")
# Use the duplicated() function to identify duplicate strings
duplicates_strings <- duplicated(my_strings)
# Print the result
print(duplicates_strings)
[1] FALSE FALSE TRUE FALSE TRUE
# Get the duplicated strings
duplicated_strings <- my_strings[duplicates_strings]
print(duplicated_strings)
[1] "apple" "banana"
The duplicated()
function can also be used with character vectors to identify duplicate strings.
These examples illustrate how the duplicated()
function can be used to identify and work with duplicate elements in a vector, which is useful for data cleaning, analysis, and other data manipulation tasks in R.
sort()
the sort()
function is used to sort the elements of a vector in either ascending or descending order. It is a fundamental function for arranging and organizing data. The sort()
function can be applied to various types of vectors, including numeric, character, and factor vectors.
Here’s the basic syntax of the sort()
function:
x
: The vector that you want to sort.
decreasing
: An optional logical parameter (default is FALSE
). If set to TRUE
, the vector is sorted in descending order; if FALSE
, it’s sorted in ascending order.
Now, let’s explore the sort()
function with some interesting examples:
# Create a numeric vector
numeric_vector <- c(5, 2, 8, 1, 3)
# Sort the vector in ascending order
sorted_vector <- sort(numeric_vector)
# Print the result
print(sorted_vector)
[1] 1 2 3 5 8
In this example, sorted_vector
contains the elements of numeric_vector
sorted in ascending order.
# Create a character vector
character_vector <- c("apple", "banana", "cherry", "date", "grape")
# Sort the vector in alphabetical order
sorted_vector <- sort(character_vector)
# Print the result
print(sorted_vector)
[1] "apple" "banana" "cherry" "date" "grape"
Here, sorted_vector
contains the elements of character_vector
sorted in alphabetical order.
# Create a numeric vector
numeric_vector <- c(5, 2, 8, 1, 3)
# Sort the vector in descending order
sorted_vector <- sort(numeric_vector, decreasing = TRUE)
# Print the result
print(sorted_vector)
[1] 8 5 3 2 1
By setting decreasing = TRUE
, we sort numeric_vector
in descending order.
In R, a “factor” is a data type that represents categorical or discrete data. Factors are used to store and manage categorical variables in a more efficient and meaningful way. Categorical variables are variables that take on a limited, fixed set of values or levels, such as “yes” or “no,” “low,” “medium,” or “high,” or “red,” “green,” or “blue.” In R, Factors are created using the factor()
function.
I am planning to write a post about the factors soon.
# Create a factor vector
factor_vector <- factor(c("high", "low", "medium", "low", "high"))
# Sort the factor vector in alphabetical order
sorted_vector <- sort(factor_vector)
# Print the result
print(sorted_vector)
[1] high high low low medium
Levels: high low medium
The sort()
function can also be used with factor vectors, where it sorts the levels in alphabetical order.
# Create a numeric vector
numeric_vector <- c(5, 2, 8, 1, 3)
# Sort the vector in ascending order and store the index order
sorted_indices <- order(numeric_vector)
sorted_vector <- numeric_vector[sorted_indices]
# Print the result
print(sorted_vector)
[1] 1 2 3 5 8
In this example, we use the order()
function to obtain the index order needed to sort numeric_vector
in ascending order. We then use this index order for sorting the vector.
The sort()
function is a versatile tool for sorting vectors in R, and it is a fundamental part of data analysis and manipulation. It can be applied to various data types, and you can control the sorting order with the decreasing
parameter.
which()
The which()
function is used to identify the indices of elements in a vector that satisfy a specified condition. It returns a vector of indices where the condition is TRUE
.
Here’s the basic syntax of the which()
function:
x
: The vector in which you want to find indices based on a condition.
arr.ind
: An optional logical parameter (default is FALSE
). If set to TRUE
, the function returns an array of indices with dimensions corresponding to x
. This is typically used when x
is a multi-dimensional array.
Now, let’s explore the which()
function with some interesting examples:
# Create a numeric vector
my_vector <- c(10, 5, 15, 3, 8)
# Find indices where values are greater than 8
indices_greater_than_8 <- which(my_vector > 8)
# Print the result
print(indices_greater_than_8)
[1] 1 3
In this example, indices_greater_than_8
contains the indices where elements in my_vector
are greater than 8.
# Create a vector with missing values (NA)
my_vector <- c(2, NA, 5, NA, 8)
# Find indices of missing values
indices_of_na <- which(is.na(my_vector))
# Print the result
print(indices_of_na)
[1] 2 4
Here, indices_of_na
contains the indices where my_vector
has missing values (NA).
The is.na()
function in R is used to identify missing values (NAs) in a vector or a data frame. It returns a logical vector or data frame of the same shape as the input, where each element is TRUE
if the corresponding element in the input is NA
, and FALSE
otherwise.
# Create a character vector
my_vector <- c("apple", "banana", "cherry", "banana", "apple")
# Find indices where values are "banana"
indices_banana <- which(my_vector == "banana")
# Print the result
print(indices_banana)
[1] 2 4
Here, indices_banana
contains the indices where elements in my_vector
are equal to “banana.”
The which()
function is versatile and can be used for various purposes, such as identifying specific elements, locating missing values, and finding indices based on custom conditions. It’s a valuable tool for data analysis and manipulation in R.
paste()
The paste()
function is used to concatenate (combine) character vectors element-wise into a single character vector. It allows you to join strings or character elements together with the option to specify a separator or collapse them without any separator. The basic syntax of the paste()
function is as follows:
...
: One or more character vectors or objects to be combined.
sep
: A character string that specifies the separator to be used between the concatenated elements. The default is a space.
collapse
: An optional character string that specifies a separator to be used when collapsing the concatenated elements into a single string. If collapse
is not specified, the result will be a character vector.
Now, let’s explore the paste()
function with some interesting examples:
# Create two character vectors
first_names <- c("John", "Alice", "Bob")
last_names <- c("Doe", "Smith", "Johnson")
# Use paste() to concatenate them with the default separator (space)
full_names <- paste(first_names, last_names)
# Print the result
print(full_names)
[1] "John Doe" "Alice Smith" "Bob Johnson"
In this example, the paste()
function concatenates first_names
and last_names
with the default separator, which is a space.
# Create a character vector
fruits <- c("apple", "banana", "cherry")
# Use paste() with a custom separator (comma and space)
fruits_list <- paste(fruits, collapse = ", ")
# Print the result
print(fruits_list)
[1] "apple, banana, cherry"
Here, we concatenate the elements in the fruits
vector with a custom separator, which is a comma followed by a space.
# Create a numeric vector and a character vector
prices <- c(10, 5, 3)
fruits <- c("apple", "banana", "cherry")
# Use paste() to combine them
item_description <- paste(prices, "USD -", fruits)
# Print the result
print(item_description)
[1] "10 USD - apple" "5 USD - banana" "3 USD - cherry"
In this example, we combine numeric values from the prices
vector with character values from the fruits
vector using paste()
.
# Create a character vector
sentence <- c("This", "is", "an", "example", "sentence")
# Use paste() to collapse the vector into a single string
collapsed_sentence <- paste(sentence, collapse = " ")
# Print the result
print(collapsed_sentence)
[1] "This is an example sentence"
Here, we use paste()
to collapse the elements of the sentence
vector into a single string with spaces between words.
The paste()
function is versatile and useful for various data manipulation tasks, such as creating custom labels, formatting output, and constructing complex strings from component parts. It allows you to combine character vectors in a flexible way.
Of course, there are many functions that can be used with vectors and other data structures. You can even create your own functions when you learn how to write functions. I tried to explain some basic and frequently used functions here in order not to make the post too long.
In conclusion, vectors are the fundamental building blocks of data in R programming, akin to atoms in the world of matter. They are versatile, efficient, and indispensable for a wide range of data analysis tasks. By understanding their importance and mastering the use of vector-related functions, you can unlock the full potential of R for your data manipulation and analysis endeavors.