How to split a character vector into data frame?

前端 未结 4 2027
南旧
南旧 2020-12-15 12:47

I\'m still relatively new to R and hope you can again help me. I have a character vector with a length of 42000. The vector looks like this:

a <- c(\"blab         


        
相关标签:
4条回答
  • 2020-12-15 13:02

    maybe with

    library(reshape2)
    colsplit(a, "\\-", names=c("A", "B", "C"))
    
              A                B     C
    1 blablabla 19960101T000000Z 1.tsv
    2 blablabla 19960101T000000Z 2.tsv
    3 blablabla 19960101T000000Z 3.tsv
    

    or

    b <- colsplit(a, "[[:punct:]]|\\T|\\.", names=c("A", "B", "C", "D","E"))
    
              A        B       C D   E
    1 blablabla 19960101 000000Z 1 tsv
    2 blablabla 19960101 000000Z 2 tsv
    3 blablabla 19960101 000000Z 3 tsv
    

    and then

    library(lubridate)
    b$B <- ymd(b$B)
    
              A          B       C D   E
    1 blablabla 1996-01-01 000000Z 1 tsv
    2 blablabla 1996-01-01 000000Z 2 tsv
    3 blablabla 1996-01-01 000000Z 3 tsv
    
    str(b)
    'data.frame':   3 obs. of  5 variables:
     $ A: chr  "blablabla" "blablabla" "blablabla"
     $ B: POSIXct, format: "1996-01-01" "1996-01-01" "1996-01-01"
     $ C: chr  "000000Z" "000000Z" "000000Z"
     $ D: int  1 2 3
     $ E: chr  "tsv" "tsv" "tsv"
    
    0 讨论(0)
  • 2020-12-15 13:03

    You can almost use read.table directly, but your date format isn't the same as what R would use for the colClasses argument.

    No problem. Just specify your own class and proceed :-)

    ## Create a class called "ymdDate"
    setClass("ymdDate")
    setAs("character", "ymdDate", function(from) as.Date(from, format="%Y%m%d"))
    
    ## Use `read.table` on your character vector. For convenience, I've
    ##   used `gsub` to get rid of the `.tsv` in before reading it in.
    out <- read.table(text = gsub(".tsv$", "", a), header = FALSE, 
                      sep = "-", colClasses=c("character", "ymdDate", "integer"))
    out
    #          V1         V2 V3
    # 1 blablabla 1996-01-01  1
    # 2 blablabla 1996-01-01  2
    # 3 blablabla 1996-01-01  3
    str(out)
    # 'data.frame':  3 obs. of  3 variables:
    #  $ V1: chr  "blablabla" "blablabla" "blablabla"
    #  $ V2: Date, format: "1996-01-01" "1996-01-01" "1996-01-01"
    #  $ V3: int  1 2 3
    
    0 讨论(0)
  • 2020-12-15 13:11

    I know I'm late to this party, but I wanted to see this same idea in a magrittr pipe and using more tidyverse functions. Here's what I've got:

    library(stringr)
    library(lubridate)
    library(tidyverse)
    
    a <- c("blablabla-19960101T000000Z-1.tsv", "blablabla-19960101T000000Z-2.tsv", "blablabla-19960101T000000Z-3.tsv")
    
    a %>%
    strsplit('-') %>%
    transpose() %>%
    map_dfc(~data_frame(.x)) %>%
    unnest() %>%
    set_names(c('Name','Date','no')) %>% 
    mutate(Date = Date %>%
            str_extract('\\d+') %>% 
            ymd(),
            no = str_extract(no, '\\d+'))
    
    0 讨论(0)
  • 2020-12-15 13:15
    DF <- data.frame(do.call(rbind, strsplit(a, "-", fixed=TRUE)))
    DF[,2] <- as.Date(DF[,2] , format="%Y%m%d")
    DF[,3] <- as.integer(gsub(".tsv", "", DF[,3], fixed=TRUE))
    
    #         X1         X2 X3
    #1 blablabla 1996-01-01  1
    #2 blablabla 1996-01-01  2
    #3 blablabla 1996-01-01  3
    
    0 讨论(0)
提交回复
热议问题