Is there a way in R to join broken lines of csv file?

半城伤御伤魂 提交于 2019-12-02 06:14:35

This is what I have for now. See how this works on your data.

dat <- readLines("temp.txt") # read whatever is in there, one line at a time
varnames <- unlist(strsplit(dat[1], ",")) # extract variable names
nvar <- length(varnames)

k <- 1 # setting up a counter
dat1 <- matrix(NA, ncol = nvar, dimnames = list(NULL, varnames))

while(k <= length(dat)){
    k <- k + 1
    if(dat[k] == "") {k <- k + 1
        print(paste("data line", k, "is an empty string"))
        if(k > length(dat)) {break}
    }
    temp <- dat[k]
    # checks if there are enough commas or if the line was broken
    while(length(gregexpr(",", temp)[[1]]) < nvar-1){
        k <- k + 1
        temp <- paste0(temp, dat[k])
    }
    temp <- unlist(strsplit(temp, ","))
    message(k)
    dat1 <- rbind(dat1, temp)
}

dat1 = dat1[-1,] # delete the empty initial row    

The general idea is to keep collapsing text until there are enough commas in the string. Once that is achieved, the data is split at commas and added as a single row into a matrix. The code is horribly clunky and will be slow for large data files. It is the best I can do though.

For the original data example, the code works and creates a character matrix with 42 columns and 6 rows. For the smaller example, the code cannot handle the break in the last column.

If you want to implicitly add blank fields when you have rows of unequal length, set fill = TRUE in your read.table call.

If that's not the question you are asking, can you be more clear and provide a reproducible example?

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!