```
# Create a numeric vector
<- c(10, 20, 30, 40, 50)
my_vector
# Subset the second and fourth elements
<- my_vector[c(2, 4)]
subset
# Print the result
print(subset)
```

`[1] 20 40`

R Programming

data types

vectors

Author

M. Fatih Tüzen

Published

October 3, 2023

Modified

October 3, 2023

In the realm of R programming, vectors serve as the fundamental building blocks that underpin virtually every data analysis and manipulation task. Much like atoms are the smallest units of matter, vectors are the fundamental units of data in R. In this article, we will delve into the world of vectors in R programming, exploring their significance, applications, and some of the most commonly used functions that make them indispensable.

In R, a vector is a fundamental data structure that can hold multiple elements of the same data type. These elements can be numbers, characters, logical values, or other types of data. Vectors are one-dimensional, meaning they consist of a single sequence of values. These vectors can be considered as the atomic units of data storage in R, forming the basis for more complex data structures like matrices, data frames, and lists. In essence, vectors are the elemental containers for data elements.

Vectors play a pivotal role in R programming for several reasons:

**Efficient Data Storage**: Vectors efficiently store homogeneous data, saving memory and computational resources.**Vectorized Operations**: One of the most powerful aspects of R is its ability to perform operations on entire vectors efficiently, a concept known as vectorization. R is designed for vectorized operations, meaning you can perform operations on entire vectors without the need for explicit loops. This makes code concise and faster.**Compatibility**: Most R functions are designed to work with vectors, making them compatible with many data analysis and statistical techniques.**Simplicity**: Using vectors simplifies code and promotes a more intuitive and readable coding style.**Interoperability**: Vectors can be easily converted into other data structures, such as matrices or data frames, enhancing data manipulation capabilities.

Subsetting and indexing are essential operations in R that allow you to access specific elements or subsets of elements from a vector. Subsetting refers to the process of selecting a portion of a vector based on specific conditions or positions. Indexing, on the other hand, refers to specifying the position or positions of the elements you want to access within the vector.

Tip

Square brackets (** [ ]**) is used to access and subset elements in vectors and other data structures like lists and matrices. It allows you to extract specific elements or subsets of elements from a vector.

Let’s explore these concepts with interesting examples.

**Subsetting by Index**

You can subset a vector by specifying the index positions of the elements you want to access.

```
# Create a numeric vector
my_vector <- c(10, 20, 30, 40, 50)
# Subset the second and fourth elements
subset <- my_vector[c(2, 4)]
# Print the result
print(subset)
```

`[1] 20 40`

**Subsetting by Condition**

You can subset a vector based on a condition using logical vectors.

**Single Index**

Access a single element by specifying its index.

```
# Create a character vector
fruits <- c("apple", "banana", "cherry")
# Access the second element
fruit <- fruits[2]
# Print the result
print(fruit)
```

`[1] "banana"`

**Multiple Indices**

Access multiple elements by specifying multiple indices.

```
# Create a numeric vector
numbers <- c(1, 2, 3, 4, 5)
# Access the first and fourth elements
subset <- numbers[c(1, 4)]
# Print the result
print(subset)
```

`[1] 1 4`

**Negative Indexing**

Exclude elements by specifying negative indices.

```
# Create a numeric vector
numbers <- c(1, 2, 3, 4, 5)
# Exclude the second element
subset <- numbers[-2]
# Print the result
print(subset)
```

`[1] 1 3 4 5`

These examples demonstrate how to subset and index vectors in R, allowing you to access specific elements or subsets of elements based on conditions, positions, or logical criteria. These operations are fundamental in data analysis and manipulation tasks in R.

Let’s explore some commonly used functions when working with vectors in R.

`c()`

** c()** function (short for “combine” or “concatenate”) is used for creating a new vector or combining multiple values or vectors into a single vector. It allows you to create a vector by listing its elements within the function.

**1. Combining Numeric Values:**

**2. Combining Character Strings:**

```
# Creating a character vector
character_vector <- c("apple", "banana", "cherry")
print(character_vector)
```

`[1] "apple" "banana" "cherry"`

**3. Combining Different Data Types (Implicit Coercion):**

```
# Combining numeric and character values
# Numeric values are coerced to character.
mixed_vector <- c(1, "two", 3, "four")
class(mixed_vector)
```

`[1] "character"`

**4. Combining Vectors Recursively:**

`seq()`

In R, the ** seq()** function is used to generate sequences of numbers or other objects. It allows you to create a sequence of values with specified starting and ending points, increments, and other parameters. The

`seq()`

Here is the basic syntax of the ** seq()** function:

: The starting point of the sequence.`from`

: The ending point of the sequence.`to`

: The interval between values in the sequence. It is an optional parameter. If not specified, R calculates it based on the`by`

,`from`

, and`to`

parameters.`length.out`

: The desired length of the sequence. It is an optional parameter. If provided, R calculates the`length.out`

parameter based on the desired length.`by`

Here are some examples to illustrate how to use the ** seq()** function:

**Generating a Sequence of Integers**

**Generating a Sequence of Real Numbers with a Specified Increment**

`[1] 0.0 0.2 0.4 0.6 0.8 1.0`

**Generating a Sequence with a Specified Length**

**Generating a Sequence in Reverse Order**

The ** seq()** function is very useful for creating sequences of values that you can use for various purposes, such as creating sequences for plotting, generating data for simulations, or defining custom sequences for indexing elements in vectors or data frames.

`rep()`

In R, the ** rep()** function is used to replicate or repeat values to create vectors or arrays of repeated elements. It allows you to duplicate a value or a set of values a specified number of times to form a larger vector or matrix. The

`rep()`

Here’s the basic syntax of the ** rep()** function:

: The value(s) or vector(s) that you want to repeat.`x`

: An integer specifying how many times`times`

should be repeated. If you provide a vector for`x`

, each element of the vector will be repeated`x`

times.`times`

: An integer specifying how many times each element of`each`

(if it’s a vector) should be repeated before moving on to the next element. This is an optional parameter.`x`

: An integer specifying the desired length of the result. This is an optional parameter, and it can be used instead of`length.out`

and`times`

to determine the number of repetitions.`each`

Here are some examples to illustrate how to use the ** rep()** function:

**Replicating a Single Value**

**Replicating Elements of a Vector**

```
# Create a vector
my_vector <- c("A", "B", "C")
# Repeat each element of the vector 2 times
rep(my_vector, each = 2)
```

`[1] "A" "A" "B" "B" "C" "C"`

**Replicating Elements of a Vector with Different Frequencies**

```
# Repeat each element of the vector with different frequencies
rep(c("A", "B", "C"), times = c(3, 2, 4))
```

`[1] "A" "A" "A" "B" "B" "C" "C" "C" "C"`

**Controlling the Length of the Result**

` [1] 1 2 3 1 2 3 1 2 3 1`

The ** rep()** function is useful for tasks like creating data for simulations, repeating elements for plotting, and constructing vectors and matrices with specific patterns or repetitions.

`length()`

In R, the ** length()** function is used to determine the number of elements in a vector. It returns an integer value representing the length of the vector. The

`length()`

Here’s the basic syntax of the ** length()** function for vectors:

: The vector for which you want to find the length.`x`

Here’s an example of how to use the ** length()** function with vectors:

```
# Create a numeric vector
numeric_vector <- c(1, 2, 3, 4, 5)
# Use the length() function to find the length of the vector
length(numeric_vector)
```

`[1] 5`

The ** length()** function is particularly useful when you need to perform operations or make decisions based on the size or length of a vector. It is commonly used in control structures like loops to ensure that you iterate through the entire vector or to dynamically adjust the length of vectors in your code.

`unique()`

The ** unique()** function is used to extract the unique elements from a vector, returning a new vector containing only the distinct values found in the original vector. It is a convenient way to identify and remove duplicate values from a vector.

Here’s the basic syntax of the ** unique()** function:

: The vector from which you want to extract unique elements.`x`

Here’s an example of how to use the ** unique()** function with a vector:

```
# Create a vector with duplicate values
my_vector <- c(1, 2, 2, 3, 4, 4, 5)
# Use the unique() function to extract unique elements
unique(my_vector)
```

`[1] 1 2 3 4 5`

In this example, the ** unique()** function is applied to the

`my_vector`

The ** unique()** function is particularly useful when dealing with data preprocessing or data cleaning tasks, where you need to identify and handle duplicate values in a dataset. It’s also helpful when you want to generate a list of unique categories or distinct values from a categorical variable.

`duplicated()`

The ** duplicated()** function in R is a handy tool for identifying and working with duplicate elements in a vector. It returns a logical vector of the same length as the input vector, indicating whether each element in the vector is duplicated or not. You can also use the

`fromLast`

Here’s the detailed syntax of the ** duplicated()** function:

: The vector in which you want to identify duplicate elements.`x`

: An optional logical parameter (default is`fromLast`

). If set to`FALSE`

, it considers duplicates from the last occurrence of each element instead of the first.`TRUE`

Now, let’s dive into some interesting examples to understand how the ** duplicated()** function works:

**Identifying Duplicate Values**

```
# Create a vector with duplicate values
my_vector <- c(1, 2, 2, 3, 4, 4, 5)
# Use the duplicated() function to identify duplicate elements
duplicates <- duplicated(my_vector)
# Print the result
print(duplicates)
```

`[1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE`

```
# Get the values that are duplicated
duplicated_values <- my_vector[duplicates]
print(duplicated_values)
```

`[1] 2 4`

In this example, ** duplicates** is a logical vector indicating whether each element in

`my_vector`

`TRUE`

`FALSE`

**Identifying Duplicates from the Last Occurrence**

```
# Create a vector with duplicate values
my_vector <- c(1, 2, 2, 3, 4, 4, 5)
# Use the duplicated() function to identify duplicates from the last occurrence
duplicates_last <- duplicated(my_vector, fromLast = TRUE)
# Print the result
print(duplicates_last)
```

`[1] FALSE TRUE FALSE FALSE TRUE FALSE FALSE`

```
# Get the values that are duplicated from the last occurrence
duplicated_values_last <- my_vector[duplicates_last]
print(duplicated_values_last)
```

`[1] 2 4`

By setting ** fromLast = TRUE**, we identify duplicates based on their last occurrence in the vector.

**Removing Duplicate Values from a Vector**

```
# Create a vector with duplicate values
my_vector <- c(1, 2, 2, 3, 4, 4, 5)
# Use the `!` operator to negate the duplicated values and get unique values
unique_values <- my_vector[!duplicated(my_vector)]
# Print the unique values
print(unique_values)
```

`[1] 1 2 3 4 5`

In this example, we use the ** !** operator to negate the result of

`duplicated()`

**Identifying Duplicates in a Character Vector**

```
# Create a character vector with duplicate strings
my_strings <- c("apple", "banana", "apple", "cherry", "banana")
# Use the duplicated() function to identify duplicate strings
duplicates_strings <- duplicated(my_strings)
# Print the result
print(duplicates_strings)
```

`[1] FALSE FALSE TRUE FALSE TRUE`

```
# Get the duplicated strings
duplicated_strings <- my_strings[duplicates_strings]
print(duplicated_strings)
```

`[1] "apple" "banana"`

The ** duplicated()** function can also be used with character vectors to identify duplicate strings.

These examples illustrate how the ** duplicated()** function can be used to identify and work with duplicate elements in a vector, which is useful for data cleaning, analysis, and other data manipulation tasks in R.

`sort()`

the ** sort()** function is used to sort the elements of a vector in either ascending or descending order. It is a fundamental function for arranging and organizing data. The

`sort()`

Here’s the basic syntax of the ** sort()** function:

: The vector that you want to sort.`x`

: An optional logical parameter (default is`decreasing`

). If set to`FALSE`

, the vector is sorted in descending order; if`TRUE`

, it’s sorted in ascending order.`FALSE`

Now, let’s explore the ** sort()** function with some interesting examples:

**Sorting a Numeric Vector in Ascending Order**

```
# Create a numeric vector
numeric_vector <- c(5, 2, 8, 1, 3)
# Sort the vector in ascending order
sorted_vector <- sort(numeric_vector)
# Print the result
print(sorted_vector)
```

`[1] 1 2 3 5 8`

In this example, ** sorted_vector** contains the elements of

`numeric_vector`

**Sorting a Character Vector in Alphabetical Order**

```
# Create a character vector
character_vector <- c("apple", "banana", "cherry", "date", "grape")
# Sort the vector in alphabetical order
sorted_vector <- sort(character_vector)
# Print the result
print(sorted_vector)
```

`[1] "apple" "banana" "cherry" "date" "grape" `

Here, ** sorted_vector** contains the elements of

`character_vector`

**Sorting in Descending Order**

```
# Create a numeric vector
numeric_vector <- c(5, 2, 8, 1, 3)
# Sort the vector in descending order
sorted_vector <- sort(numeric_vector, decreasing = TRUE)
# Print the result
print(sorted_vector)
```

`[1] 8 5 3 2 1`

By setting ** decreasing = TRUE**, we sort

`numeric_vector`

**Sorting a Factor Vector**

In R, a “factor” is a data type that represents categorical or discrete data. Factors are used to store and manage categorical variables in a more efficient and meaningful way. Categorical variables are variables that take on a limited, fixed set of values or levels, such as “yes” or “no,” “low,” “medium,” or “high,” or “red,” “green,” or “blue.” In R, Factors are created using the ** factor()** function.

Note

I am planning to write a post about the factors soon.

```
# Create a factor vector
factor_vector <- factor(c("high", "low", "medium", "low", "high"))
# Sort the factor vector in alphabetical order
sorted_vector <- sort(factor_vector)
# Print the result
print(sorted_vector)
```

```
[1] high high low low medium
Levels: high low medium
```

The ** sort()** function can also be used with factor vectors, where it sorts the levels in alphabetical order.

**Sorting with Indexing**

```
# Create a numeric vector
numeric_vector <- c(5, 2, 8, 1, 3)
# Sort the vector in ascending order and store the index order
sorted_indices <- order(numeric_vector)
sorted_vector <- numeric_vector[sorted_indices]
# Print the result
print(sorted_vector)
```

`[1] 1 2 3 5 8`

In this example, we use the ** order()** function to obtain the index order needed to sort

`numeric_vector`

The ** sort()** function is a versatile tool for sorting vectors in R, and it is a fundamental part of data analysis and manipulation. It can be applied to various data types, and you can control the sorting order with the

`decreasing`

`which()`

The ** which()** function is used to identify the indices of elements in a vector that satisfy a specified condition. It returns a vector of indices where the condition is

`TRUE`

Here’s the basic syntax of the ** which()** function:

: The vector in which you want to find indices based on a condition.`x`

: An optional logical parameter (default is`arr.ind`

). If set to`FALSE`

, the function returns an array of indices with dimensions corresponding to`TRUE`

. This is typically used when`x`

is a multi-dimensional array.`x`

Now, let’s explore the ** which()** function with some interesting examples:

**Finding Indices of Elements Greater Than a Threshold**

```
# Create a numeric vector
my_vector <- c(10, 5, 15, 3, 8)
# Find indices where values are greater than 8
indices_greater_than_8 <- which(my_vector > 8)
# Print the result
print(indices_greater_than_8)
```

`[1] 1 3`

In this example, ** indices_greater_than_8** contains the indices where elements in

`my_vector`

**Finding Indices of Missing Values (NA)**

```
# Create a vector with missing values (NA)
my_vector <- c(2, NA, 5, NA, 8)
# Find indices of missing values
indices_of_na <- which(is.na(my_vector))
# Print the result
print(indices_of_na)
```

`[1] 2 4`

Here, ** indices_of_na** contains the indices where

`my_vector`

Tip

The ** is.na()** function in R is used to identify missing values (NAs) in a vector or a data frame. It returns a logical vector or data frame of the same shape as the input, where each element is

`TRUE`

`NA`

`FALSE`

**Finding Indices of Specific Values**

```
# Create a character vector
my_vector <- c("apple", "banana", "cherry", "banana", "apple")
# Find indices where values are "banana"
indices_banana <- which(my_vector == "banana")
# Print the result
print(indices_banana)
```

`[1] 2 4`

Here, ** indices_banana** contains the indices where elements in

`my_vector`

The ** which()** function is versatile and can be used for various purposes, such as identifying specific elements, locating missing values, and finding indices based on custom conditions. It’s a valuable tool for data analysis and manipulation in R.

`paste()`

The ** paste()** function is used to concatenate (combine) character vectors element-wise into a single character vector. It allows you to join strings or character elements together with the option to specify a separator or collapse them without any separator. The basic syntax of the

`paste()`

: One or more character vectors or objects to be combined.`...`

: A character string that specifies the separator to be used between the concatenated elements. The default is a space.`sep`

: An optional character string that specifies a separator to be used when collapsing the concatenated elements into a single string. If`collapse`

is not specified, the result will be a character vector.`collapse`

Now, let’s explore the ** paste()** function with some interesting examples:

**Concatenating Character Vectors with Default Separator**

```
# Create two character vectors
first_names <- c("John", "Alice", "Bob")
last_names <- c("Doe", "Smith", "Johnson")
# Use paste() to concatenate them with the default separator (space)
full_names <- paste(first_names, last_names)
# Print the result
print(full_names)
```

`[1] "John Doe" "Alice Smith" "Bob Johnson"`

In this example, the ** paste()** function concatenates

`first_names`

`last_names`

**Specifying a Custom Separator**

```
# Create a character vector
fruits <- c("apple", "banana", "cherry")
# Use paste() with a custom separator (comma and space)
fruits_list <- paste(fruits, collapse = ", ")
# Print the result
print(fruits_list)
```

`[1] "apple, banana, cherry"`

Here, we concatenate the elements in the ** fruits** vector with a custom separator, which is a comma followed by a space.

**Combining Numeric and Character Values**

```
# Create a numeric vector and a character vector
prices <- c(10, 5, 3)
fruits <- c("apple", "banana", "cherry")
# Use paste() to combine them
item_description <- paste(prices, "USD -", fruits)
# Print the result
print(item_description)
```

`[1] "10 USD - apple" "5 USD - banana" "3 USD - cherry"`

In this example, we combine numeric values from the ** prices** vector with character values from the

`fruits`

`paste()`

**Collapsing a Character Vector**

```
# Create a character vector
sentence <- c("This", "is", "an", "example", "sentence")
# Use paste() to collapse the vector into a single string
collapsed_sentence <- paste(sentence, collapse = " ")
# Print the result
print(collapsed_sentence)
```

`[1] "This is an example sentence"`

Here, we use ** paste()** to collapse the elements of the

`sentence`

The ** paste()** function is versatile and useful for various data manipulation tasks, such as creating custom labels, formatting output, and constructing complex strings from component parts. It allows you to combine character vectors in a flexible way.

Of course, there are many functions that can be used with vectors and other data structures. You can even create your own functions when you learn how to write functions. I tried to explain some basic and frequently used functions here in order not to make the post too long.

In conclusion, vectors are the fundamental building blocks of data in R programming, akin to atoms in the world of matter. They are versatile, efficient, and indispensable for a wide range of data analysis tasks. By understanding their importance and mastering the use of vector-related functions, you can unlock the full potential of R for your data manipulation and analysis endeavors.