问题
I have this badly organized data table given to me, in which there are hundreds of columns (subset is given below)
Names of columns are dot delimited where the first field holds information about a type of object (e.g. Item123, object_AB etc.) without any naming convention. There is no specific order for these columns as well. Other columns share the type of object field and also have the name of some property for that object (e.g. color, manufacturer etc.).
Item123.type.value Item123.mass.value Item123.color.value object_AB.type.value object_AB.mass.value object_AB.color.value
Desk 11.2 blue Chair 2.3 orange
Desk 14.2 red Sofa 22 grey
Armchair 23.3 black Monitor 2.2 white
EDITED: Adding dput() structure:
structure(list(Item123.type.value = structure(c(2L, 2L, 1L),
levels = c("Armchair", "Desk"), class = "factor"), Item123.mass.value = structure(1:3,
levels = c("11.2", "14.2", "23.3"), class = "factor"), Item123.color.value = structure(c(2L,
3L, 1L), levels = c("black", "blue", "red"), class = "factor"),
object_AB.type.value = structure(c(1L, 3L, 2L), levels = c("Chair",
"Monitor", "Sofa"), class = "factor"), object_AB.mass.value = structure(c(2L,
3L, 1L), levels = c("2.2", "2.3", "22"), class = "factor"),
object_AB.color.value = structure(c(2L, 1L, 3L), levels = c("grey",
"orange", "white"), class = "factor")), row.names = c(NA_integer_,
-3L), class = "data.frame")
I need to convert the table into something like this (order of rows does not matter):
type name mass color
Item123 Desk 11.2 blue
Item123 Desk 14.2 red
object_AB Chair 2.3 orange
object_AB Sofa 22 grey
Item123 Armchair 23.3 black
object_AB Monitor 2.2 white
I would really appreciate any help I could get!!
回答1:
You can use pivot_longer
here specifying names_pattern
to get data from the column names.
tidyr::pivot_longer(df,
cols = everything(),
names_to = c('name', '.value'),
names_pattern = '(\\w+)\\.(\\w+)\\.')
# A tibble: 6 x 4
# name type mass color
# <chr> <fct> <fct> <fct>
#1 Item123 Desk 11.2 blue
#2 object_AB Chair 2.3 orange
#3 Item123 Desk 14.2 red
#4 object_AB Sofa 22 grey
#5 Item123 Armchair 23.3 black
#6 object_AB Monitor 2.2 white
回答2:
I would suggest this approach and maybe it could be longest and boring using as df
the data you added. The code looks for specific patterns in your column names, reshape it and finally merge all:
library(tidyverse)
#Code
df %>% select(contains('type')) %>%
mutate(id=1:n()) %>%
pivot_longer(-id) %>%
separate(name,into = c(paste0('V',1:3)),sep = '\\.') %>%
select(-c(V2,V3)) %>%
rename(Value1=value) %>%
left_join(df %>% select(contains('mass')) %>%
mutate(id=1:n()) %>%
pivot_longer(-id) %>%
separate(name,into = c(paste0('V',1:3)),sep = '\\.') %>%
select(-c(V2,V3)) %>%
rename(Value2=value)) %>%
left_join(df %>% select(contains('color')) %>%
mutate(id=1:n()) %>%
pivot_longer(-id) %>%
separate(name,into = c(paste0('V',1:3)),sep = '\\.') %>%
select(-c(V2,V3)) %>%
rename(Value3=value))
Output:
# A tibble: 6 x 5
id V1 Value1 Value2 Value3
<int> <chr> <chr> <dbl> <chr>
1 1 Item123 Desk 11.2 blue
2 1 object_AB Chair 2.3 orange
3 2 Item123 Desk 14.2 red
4 2 object_AB Sofa 22 grey
5 3 Item123 Armchair 23.3 black
6 3 object_AB Monitor 2.2 white
来源:https://stackoverflow.com/questions/63696646/reshaping-a-table-in-r-while-parsing-information-from-column-names-and-using-it