I have a dirty dataset that I could not read it with header = T
. After I read and clean it, I would like to use the now first row data as the column name. I tri
Using data.table:
library(data.table)
namex <-c("col1","col2","col3","col4")
row1 <- c(2, 4, 5, 56)
row2 <- c(74, 73, 3, 534)
row3 <- c(865, 768, 8, 7)
row4 <- c(68, 58, 65, 87)
t1 <- data.table(namex, row1, row2, row3, row4)
t1 <- data.table(t(t1))
setnames(t1, as.character(t1[1,]))
t1 <- t1[-1,]
While @sbha has already offered a tidyverse
solution, I would like to leave a fully pipeable dplyr
option. I agree that this should could be an incredibly useful function.
library(dplyr)
data.frame(x = c("a", 1, 2, 3), y = c("b", 4, 5, 6)) %>%
`colnames<-`(.[1, ]) %>%
.[-1, ]
Take a step back, when you read your data use skip=1
in read.table
to miss out the first line entirely. This should make life a bit easier when you're cleaning data, particularly for data type. This is key as your problem stems from your data being encoded as factor.
You can then read in your column names separately with nrows=1
in read.table
.
How about:
my.names <- t1[1,]
colnames(t1) <- my.names
i.e. specifically naming the row as a variable?
with the following code:
namex <-c("col1","col2","col3","col4")
row1 <- c(2, 4, 5, 56)
row2 <- c(74, 73, 3, 534)
row3 <- c(865, 768, 8, 7)
row4 <- c(68, 58, 65, 87)
t1 <- data.frame(namex, row1, row2, row3, row4)
t1 <- t(t1)
my.names <- t1[1,]
colnames(t1) <- my.names
It seems to work, but maybe I'm missing something?
Sam Firke's ever useful package janitor has a function especially for this: row_to_names
.
Example from his documentation:
library(janitor)
x <- data.frame(X_1 = c(NA, "Title", 1:3),
X_2 = c(NA, "Title2", 4:6))
x %>%
row_to_names(row_number = 2)
Similar to some of the other answers, here is a dplyr
/tidyverse
option:
library(tidyverse)
names(df) <- df %>% slice(1) %>% unlist()
df <- df %>% slice(-1)