问题
This should be such an easy problem but I have trouble with. I have a dirty dataset that I could not read it with header=T
. After I read and clean it, I would like to use the now first row data as the column name. I tried multiple methods on stackoverflow without success. What could be the problem?
The dataset t1
should look like this after clean up:
V1 V2 V3 V4 V5
1 col1 col2 col3 col4
2 row1 2 4 5 56
3 row2 74 74 3 534
4 row3 865 768 8 7
5 row4 68 86 65 87
I tried: colnames(t1)=t1[1,]
. Nothing happens.
I tried: names(t1)=ti[1,]
, Nothing happens.
I tried: lapply(t1, function(x) {names(x)<-x[1,]; x})
it returns an error message:
Error in
[.default
(x, 1, ) : incorrect number of dimensions
Could anyone help?
回答1:
header.true <- function(df) {
names(df) <- as.character(unlist(df[1,]))
df[-1,]
}
Test
df1 <- data.frame(c("a", 1,2,3), c("b", 4,5,6))
header.true(df1)
a b
2 1 4
3 2 5
4 3 6
回答2:
Take a step back, when you read your data use skip=1
in read.table
to miss out the first line entirely. This should make life a bit easier when you're cleaning data, particularly for data type. This is key as your problem stems from your data being encoded as factor.
You can then read in your column names separately with nrows=1
in read.table
.
回答3:
Probably, the data type of the data frame columns are factors. That is why the code you tried didn't work, you can check it using str(df)
:
Use the argument
stringsAsFactors = FALSE
when you import your data:
df <- read.table(text = "V1 V2 V3 V4 V5
col1 col2 col3 col4 col5
row1 2 4 5 56
row2 74 74 3 534
row3 865 768 8 7
row4 68 86 65 87", header = TRUE,
stringsAsFactors = FALSE )
Then you can use your first attempt, then remove your first row if you'd like:
colnames(df) <- df[1,]
df <- df[-1, ]
It will work if your columns are factors or characters:
names(df) <- lapply(df[1, ], as.character)
df <- df[-1,]
Output:
col1 col2 col3 col4 col5
2 row1 2 4 5 56
3 row2 74 74 3 534
4 row3 865 768 8 7
5 row4 68 86 65 87
回答4:
How about:
my.names <- t1[1,]
colnames(t1) <- my.names
i.e. specifically naming the row as a variable?
with the following code:
namex <-c("col1","col2","col3","col4")
row1 <- c(2, 4, 5, 56)
row2 <- c(74, 73, 3, 534)
row3 <- c(865, 768, 8, 7)
row4 <- c(68, 58, 65, 87)
t1 <- data.frame(namex, row1, row2, row3, row4)
t1 <- t(t1)
my.names <- t1[1,]
colnames(t1) <- my.names
It seems to work, but maybe I'm missing something?
回答5:
Using data.table:
library(data.table)
namex <-c("col1","col2","col3","col4")
row1 <- c(2, 4, 5, 56)
row2 <- c(74, 73, 3, 534)
row3 <- c(865, 768, 8, 7)
row4 <- c(68, 58, 65, 87)
t1 <- data.table(namex, row1, row2, row3, row4)
t1 <- data.table(t(t1))
setnames(t1, as.character(t1[1,]))
t1 <- t1[-1,]
回答6:
Similar to some of the other answers, here is a dplyr
/tidyverse
option:
library(tidyverse)
names(df) <- df %>% slice(1) %>% unlist()
df <- df %>% slice(-1)
来源:https://stackoverflow.com/questions/32054368/use-first-row-data-as-column-names-in-r