4 ways to select columns with dplyr select()
The dplyr select() function is one of the most important functions of dplyr that makes it possible to select one or more columns in the data frame.
With version dplyr 1.0.0 the select() function got a new function that makes it easier to select columns in different ways. One of the most common methods for selecting columns is the use of column names. However, in version 1.0.0 of dplyr, we can select columns based on their layout.
In this article we look at examples of four ways to select columns in the data frame. We start by selecting columns by title, then look at examples of selecting columns by position, selecting columns by column type and selecting columns with functions that search for patterns in names.
Download neatly and make sure you have dplyr 1.0.0+.
Library(tidy)
PackageVersion(dplyr)
[1] ‘1.0.0’.
We will use the penguin dataset to select 4 columns in different ways with the select() function.
path2data <- https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv
penguins<- readr::read_csv(path2data)
We see that we have different types of stakeholders.
## Disassembled with loudspeaker specifications:
## cols(
## kind = character_nek(),
## island = character_nek(),
## bill_length_mm = double_nek(),
## bill_depth_mm = col_double(),
## flipper_length_mm = col_double(),
## body_mass_g = col_double(),
## sex = col_character()
## ).
dplyr select(): How do I select columns by name?
Let’s start by selecting the columns in the data frame by name, which is the most common way to select columns.
# select column by name
Penguins %>%
dplyr::select(species, island,flipper_length_mm)
# One piece: 6 x 3
# # # Kind of flipper island_length_mm
# #
# 1 Adelie Torgersen 181
# 2 Adelie Torgersen 186
# 3 Adelie Torgersen 195
# # # 4 Adelie Torgersen NA
# # 5 Adelie Torgersen 193
# # 6 Adelie Torgersen 190 #
dplyr select(): How can the columns be selected based on their position?
We will select the same columns as in the previous example, but this time we will use their position in the data frame. For example, a column view is the first column of the data frame and a block is the second column of the data frame.
We can simply specify a column position or its position as an argument for the select() function.
# dplyr Select column at position
Penguins %>% Select
(1,2,5)
And we get the same results as above.
# One piece: 6 x 3
# # # Kind of flipper island_length_mm
# #
# 1 Adelie Torgersen 181
# 2 Adelie Torgersen 186
# 3 Adelie Torgersen 195
# # # 4 Adelie Torgersen NA
# # 5 Adelie Torgersen 193
# # 6 Adelie Torgersen 190 #
One of the good (or bad?) things about selecting columns by position is that if you enter a column position that doesn’t exist, dplyrs select() ignores it and prints the result of the remaining position of the valid column.
Here, for example, we specify the zero position of a column that does not go out. However, the select() function does not lock, but displays the results for the remaining positions of valid columns.
# dplyr selection column by position ignores missing
penguins %>%
select(0,2,5)
As a result, the column missed the 0” position column we requested.
# One piece: 6 x 2
# # # Length_of_the_island_fins
# #
# 1 Torgersen 181
# # 2 Torgersen 186
# # 3 Torgersen 195
# # 4 Torgersen NA
# # 5 Torgersen 193
# # 6 Torgersen 190
dplyr select(): How can the columns be selected according to their type?
It is often useful to select columns according to their type. For example, you can select all numeric columns for further analysis.
To get all the numeric columns, we can use select() where(is.numeric) as an argument.
# dplyr select all columns that are numeric
penguins %>%
select (where(is.numeric))
And you get all the numerical columns,
## # A tibble: 6 x 4
## bill_length_mm bill_depth_mm flipper_leng_mm body_mass_g
##
## 1 39,1 18,7 181 3750
## 2 39,5 17,4 186 3800
## 3 40,3 18 195 3250
## 4 NA NA
## 5 36,7 19,3 193 3450
## 6 39,3 20,6 190 3650
In the same way, we can select all columns that are a factor using where(is.factor) and all columns that are symbols using where(is.character).
Note that using the where() function to select columns here is new in dplyr 1.0.0.
dplyr select(): How to select columns by name function?
dplyr starts_with(): How do I select columns whose name begins with the line?
# dplyr selection column whose names start with penguins
%>%
select(start_with(account))
# A tibble: 6 x 2
# # Bill_length_mm bill_deepth_mm
# #
# 1 39,1 18,7
# 2 39,5 17,4
# 3 40,3 18
# 4 NA
# 5 36,7 19,3
# # 6 39,3 20,6
dplyr ends_with(): How do I select columns whose name ends with a line?
The fourth way to select columns in a data frame is to search for a row or pattern in the column names. For example, we often want to select columns that start or end with a row.
dplyr has special functions for this purpose. For example, to select columns starting with start_with(), and in the same way, we can select columns ending with a certain row with end_with().
Here is an example where we select the columns that end with the line mm.
Sometimes I wish
# dplyr select the column whose name ends with penguins
%>%
select(ends with(mm))
We now have all the columns ending in mm.
# A tibble: 6 x 3
# # Bill_length_mm bill_depth_mm flipper_length_mm
# #
# 1 39,1 18,7 181
# 2 39,5 17,4 186
# 3 40,3 18 195
# 4 NA
# 5 36,7 19,3 193
# 6 39,3 20,6 190
And not just that. As suggested on the dplyr document page, we can also use any combination of the above approaches with Boolean operators for the selection of columns.
- df %>% select(!where(is.factor)): selects all non-factor variables
- df %>% select(where(is.numeric) & start_with(x)): selects all numeric variables beginning with x
- df %>% select(start_with(a) | ends_with(z)): selects all variables that start with a or end with z.
Related Tags:
error in rename unused argument,rename values in r,error in select unused arguments,dplyr select rows,dplyr::rename_at,r select columns tidyr,r select multiple columns by name,read specific rows and columns in r,r subset dataframe by column value,extract columns from dataframe,r get column names,dplyr rename multiple columns,dplyr::select columns by index,dplyr get column names,dplyr drop rows,dplyr select_if,dplyr select does not contain,dplyr error in select unused arguments,suzan dplyr,select columns in r by value,append column to dataframe r,r rename column names,r undefined columns selected,r select rows by condition,what does the dplyr verb 'group by' do?,r select_if,filter or select r,r dplyr use,dplyr documentation pdf,portal_data_joined csv,r data table sql,r tbl sql,tidyverse mutate,datacarpentry r ecology,dplyr: summarise without dropping columns,dplyr::select columns by name,dplyr select all but one column,dplyr filter,dplyr drop column