问题
I have a dataframe which has multiple values in a single file. I want to divide it into multiple files around 25 from the file. Pattern for the file is where there is one blank row and a header title is there , it is a new df. I Have tried this Splitting dataframes in R based on empty rows but this does not take care of any blank row within the new df (V1 column 9th row). I want the data to be divided on empty row and a header title my data and code i have tried is given below . Also how can i put the header row as the Dataframe name in my newly created dfs.
df = structure(list(V1 = c("Machine", "", "Machine", "V1", "03-09-2020",
"", "Machine", "No", "Name", "a", "1", "2", "", "Machine", "No",
""), V2 = c("Data", "", "run", "V2", "600119", "", "error", "SpNo",
"", "a", "b", "c", "", "logs", "sp", ""), V3 = c("Editor", "",
"information", "V3", "6", "", "messages", "OP", "", "", "b",
"c", "", "", "op", ""), V4 = c("", "", "", "V4", "", "", "",
"OP", "", "", "", "", "", "", "name", "")), class = "data.frame", row.names = c(NA,
-16L))
dt <- df
## add column to indicate groups
dt$tbl_id <- cumsum(!nzchar(dt$V1)
unique(dt$tbl_id)
## remove blank lines
dt <- dt[nzchar(dt$V1), ]
## split the data frame
dt_s <- split(dt[, -ncol(dt)], dt$tbl_id)
## use first line as header and reset row numbers
dt_s <- lapply(dt_s, function(x) {
colnames(x) <- x[1, ]
x <- x[-1, ]
rownames(x) <- NULL
x
})
any help will be highly useful . Also all the header title will be same in all the files. I am using lapply for the multiple file operations.
Expected output will be :-
Machine_run_nformation <- read.table(text="
V1 V2 V3 V4
03-09-2020 600119 - 6
",header = T)
Machine_error_essages <- read.table(text="
No SpNo OP OP_Name
- - a a
1 - b b
2 - c c
",header = T)
Similar to these - there will be 25 outputs
回答1:
Maybe you can try
u <- rowSums(df == "")==ncol(df)
out <- split(subset(df,!u),cumsum(u)[!u])
which gives
> out
$`0`
V1 V2 V3 V4
1 Machine Data Editor
$`1`
V1 V2 V3 V4
3 Machine run information
4 V1 V2 V3 V4
5 03-09-2020 600119 6
$`2`
V1 V2 V3 V4
7 Machine error messages
8 No SpNo OP OP
9 Name
10 a a
11 1 b b
12 2 c c
$`3`
V1 V2 V3 V4
14 Machine logs
15 No sp op name
回答2:
here is an approach using dplyr::group_split
(which is in an experimental lifecycle).
df = structure(list(V1 = c("Machine", "", "Machine", "V1", "03-09-2020",
"", "Machine", "No", "Name", "a", "1", "2", "", "Machine", "No",
""), V2 = c("Data", "", "run", "V2", "600119", "", "error", "SpNo",
"", "a", "b", "c", "", "logs", "sp", ""), V3 = c("Editor", "",
"information", "V3", "6", "", "messages", "OP", "", "", "b",
"c", "", "", "op", ""), V4 = c("", "", "", "V4", "", "", "",
"OP", "", "", "", "", "", "", "name", "")), class = "data.frame", row.names = c(NA,
-16L))
df %>%
dplyr::mutate(FLAG=rowSums(.=="")==ncol(.)) %>%
dplyr::mutate(GRP=cumsum(FLAG)) %>%
dplyr::filter(!FLAG) %>%
dplyr::group_by(GRP) %>%
dplyr::group_split() %>%
lapply(function(f) dplyr::select(f,-FLAG,-GRP))
[[1]]
# A tibble: 1 x 4
V1 V2 V3 V4
<chr> <chr> <chr> <chr>
1 Machine Data Editor ""
[[2]]
# A tibble: 3 x 4
V1 V2 V3 V4
<chr> <chr> <chr> <chr>
1 Machine run information ""
2 V1 V2 V3 "V4"
3 03-09-2020 600119 6 ""
[[3]]
# A tibble: 6 x 4
V1 V2 V3 V4
<chr> <chr> <chr> <chr>
1 Machine "error" "messages" ""
2 No "SpNo" "OP" "OP"
3 Name "" "" ""
4 a "a" "" ""
5 1 "b" "b" ""
6 2 "c" "c" ""
[[4]]
# A tibble: 2 x 4
V1 V2 V3 V4
<chr> <chr> <chr> <chr>
1 Machine logs "" ""
2 No sp "op" "name"
来源:https://stackoverflow.com/questions/63718774/divide-or-split-dataframe-into-multiple-dfs-based-on-empty-row-and-header-title