I have read a CSV
file into an R data.frame. Some of the rows have the same element in one of the columns. I would like to remove rows that are duplicates in th
Remove duplicate rows of a dataframe
library(dplyr)
mydata <- mtcars
# Remove duplicate rows of the dataframe
distinct(mydata)
In this dataset, there is not a single duplicate row so it returned same number of rows as in mydata.
Remove Duplicate Rows based on a one variable
library(dplyr)
mydata <- mtcars
# Remove duplicate rows of the dataframe using carb variable
distinct(mydata,carb, .keep_all= TRUE)
The .keep_all function is used to retain all other variables in the output data frame.
Remove Duplicate Rows based on multiple variables
library(dplyr)
mydata <- mtcars
# Remove duplicate rows of the dataframe using cyl and vs variables
distinct(mydata, cyl,vs, .keep_all= TRUE)
The .keep_all function is used to retain all other variables in the output data frame.
(from: http://www.datasciencemadesimple.com/remove-duplicate-rows-r-using-dplyr-distinct-function/ )